User Rating 0.0
Total Usage 0 times
Minimum 1 character required for analysis
Is this tool helpful?

Your feedback helps us improve.

About

Letter frequency analysis is foundational to cryptanalysis, computational linguistics, and information theory. In monoalphabetic substitution ciphers, frequency analysis is the primary attack vector: if you cannot identify that e occurs at roughly 12.7% in English text, decryption becomes guesswork. This tool computes absolute counts and relative frequencies for every character in your input, normalized against total alphabetic characters. Results are sortable by rank or alphabetical order and visualized as a proportional bar chart.

Accuracy depends on sample size. Texts shorter than 200 characters produce unstable distributions that deviate significantly from language norms. The tool assumes no encoding transformation; it processes raw UTF-16 code points as delivered by the browser. For case-insensitive analysis, enable the fold option to merge uppercase and lowercase into a single bucket. Pro tip: compare your output against known language profiles (English, French, German) in the reference table below to identify the source language of an unknown text.

letter frequency character count text analysis frequency distribution cryptanalysis linguistics

Formulas

The relative frequency of each letter is computed by dividing its count by the total number of alphabetic characters in the sample.

fi = ciN × 100

Where fi is the percentage frequency of letter i, ci is the raw count of occurrences of letter i, and N is the total count of all alphabetic characters in the input: N = 26i=1 ci.

The Index of Coincidence (IC) measures how likely two randomly chosen letters from the text are identical. For monolingual English text, the expected IC0.0667. Random uniform text yields IC0.0385.

IC = 26i=1 ci(ci 1)N(N 1)

Where ci is the count for the i-th letter and N is total letter count. An IC significantly below 0.06 suggests polyalphabetic encryption or a non-natural text source.

Reference Data

LetterEnglish %French %German %Spanish %Italian %Portuguese %
a8.1677.6366.51612.5311.7414.63
b1.4920.9011.8861.420.921.04
c2.7823.2602.7324.684.503.88
d4.2533.6695.0765.863.734.99
e12.70214.71516.39613.6811.7912.57
f2.2281.0661.6560.690.951.02
g2.0150.8663.0091.011.641.30
h6.0940.7374.5770.701.541.28
i6.9667.5296.5506.2511.286.18
j0.1530.5450.2680.440.000.40
k0.7720.0491.4170.010.000.02
l4.0255.4563.4374.976.512.78
m2.4062.9682.5343.152.514.74
n6.7497.0959.7766.716.885.05
o7.5075.3782.5948.689.8310.73
p1.9293.0210.6702.513.052.52
q0.0951.3620.0180.880.511.20
r5.9876.5537.0036.876.376.53
s6.3277.9487.2707.984.987.81
t9.0567.2446.1544.635.624.34
u2.7586.3114.1663.933.014.63
v0.9781.6280.8460.902.101.67
w2.3600.1141.9210.020.000.01
x0.1500.3870.0340.220.000.21
y1.9740.3080.0390.900.000.01
z0.0740.1361.1340.520.490.47

Frequently Asked Questions

A minimum of 200 alphabetic characters produces a rough profile. For statistically stable results that converge within ±0.5% of true language frequencies, you need approximately 2,000-5,000 characters. Shorter texts exhibit high variance - a 50-character sample could show "z" at 4% purely by chance.
The Index of Coincidence (IC) measures the probability that two randomly selected letters from the text are identical. English text has an expected IC ≈ 0.0667. A value near 0.0385 indicates uniform random distribution (or strong polyalphabetic cipher). Values between these extremes suggest partial encryption or mixed-language content.
By default, this tool folds uppercase to lowercase so that "A" and "a" are counted as the same letter. If you disable case folding, each case variant is tracked independently, which is useful for analyzing text formatting patterns or programming source code where casing carries semantic meaning.
Non-alphabetic characters are excluded from the letter frequency percentage calculation (the denominator N counts only a - z). However, total character count, digit count, whitespace count, and punctuation count are reported separately in the statistics summary. This ensures the percentage distribution is directly comparable to standard language frequency tables.
Yes. Compare the output frequency profile against the reference table provided. English is characterized by a dominant "e" at ~12.7% and high "t" ~9.1%. German shows "e" at ~16.4%. French exhibits high "s" ~7.9% and "e" ~14.7%. A chi-squared test between your observed frequencies and each language profile yields a quantitative match score. The tool computes this automatically.
Published tables (like those from Robert Lewand or Peter Norvig's Google corpus analysis) are derived from millions of words. Your input is a finite sample subject to topic bias. Technical writing overrepresents "x" and "z". Dialogue-heavy fiction skews toward personal pronouns, inflating "i" and "y". The deviation itself is analytically informative - it reveals the genre and register of your text.