About

Accurate letter frequency analysis is foundational to cryptography, linguistics, and data validation. A miscount in frequency tables can break substitution cipher analysis or skew readability metrics like the Flesch-Kincaid index. This tool performs a single-pass scan of your input, grouping characters by Unicode letter category (\p{L}) to correctly handle accented characters and non-Latin scripts. It reports both individual letter counts and their relative frequency as a percentage of total letters, not total characters. Note: whitespace, digits, and punctuation are excluded from the letter count by design. Results assume case-insensitive grouping. Pro tip: paste a full paragraph to get statistically meaningful distributions. For English text, expect e near 12.7% and z near 0.07%.

Formulas

The relative frequency of each letter is computed as the ratio of its occurrence count to the total number of letters detected in the input string.

f_i = count(letter_i)n∑j=1 count(letter_j) × 100%

Where f_i is the percentage frequency of letter i, count(letter_i) is the number of occurrences of that letter (case-insensitive), and the denominator is the total number of Unicode letters in the input. Characters matching the Unicode property \p{L} are classified as letters. Digits, whitespace, and punctuation are excluded.

Reference Data

Letter	Expected English Frequency	Approx. Rank	Common Usage Note
E	12.70%	1	Most common in nearly all English corpora
T	9.06%	2	High in function words (the, that, to)
A	8.17%	3	Common vowel in open syllables
O	7.51%	4	Frequent in prepositions and conjunctions
I	6.97%	5	Pronoun and suffix carrier
N	6.75%	6	Common consonant in endings (-tion, -ing)
S	6.33%	7	Plural marker and verb conjugation
H	6.09%	8	Frequent in digraphs (th, ch, sh)
R	5.99%	9	Common in clusters (tr, pr, str)
D	4.25%	10	Past tense marker (-ed)
L	4.03%	11	Frequent in articles and suffixes (-ly, -al)
C	2.78%	12	Hard/soft duality (cat vs. city)
U	2.76%	13	Always follows Q in English
M	2.41%	14	Nasal consonant, common word-initial
W	2.36%	15	Question words and function words
F	2.23%	16	Labiodental fricative, common in prefixes
G	2.02%	17	Present participle (-ing)
Y	1.97%	18	Semi-vowel, adverb suffix (-ly)
P	1.93%	19	Plosive, common in prefixes (pre-, pro-)
B	1.49%	20	Voiced bilabial plosive
V	0.98%	21	Never doubles in native English words
K	0.77%	22	Often silent before N (knee, know)
J	0.15%	23	Rare, mostly word-initial
X	0.15%	24	Often represents /ks/ cluster
Q	0.10%	25	Almost always paired with U
Z	0.07%	26	Least common, more frequent in American spelling

Frequently Asked Questions

All letters are grouped case-insensitively. "A" and "a" both increment the count for the letter A. The results display the uppercase form as the label, but the count reflects both cases combined.

Yes. Accented characters are treated as distinct letters. "É" is counted separately from "E" because they are different Unicode code points. This matters for languages like French, Spanish, and German where accented characters carry semantic meaning.

Standard English frequency tables (like those from Robert Lewand or Herbert Zim) are derived from corpora of millions of words. Short text samples exhibit high variance. You typically need at least 1000 letters before frequency distributions begin to converge toward expected values. Technical or domain-specific text will also deviate due to specialized vocabulary.

Digits (0 - 9), whitespace (spaces, tabs, newlines), punctuation marks, symbols, and emoji are all excluded. Only characters matching the Unicode letter category \p{L} are counted. This includes Latin, Cyrillic, Greek, and other alphabetic scripts.

Frequency analysis is the foundational technique for breaking monoalphabetic substitution ciphers. By comparing the observed letter distribution of ciphertext against expected English frequencies (where E ≈ 12.7% and T ≈ 9.1%), you can map cipher letters to plaintext candidates. This tool provides the observed distribution. For polyalphabetic ciphers (Vigenère), you would need to segment the text by key length first.

Each bar width is proportional to the count of that letter relative to the maximum count found in your input. The letter with the highest count fills 100% of the bar width. All other bars scale linearly against that maximum. This provides a visual comparison of relative frequency within your specific text, not against the English standard.