User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Supports all Unicode letters including accented characters
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Accurate letter frequency analysis is foundational to cryptography, linguistics, and data validation. A miscount in frequency tables can break substitution cipher analysis or skew readability metrics like the Flesch-Kincaid index. This tool performs a single-pass scan of your input, grouping characters by Unicode letter category (\p{L}) to correctly handle accented characters and non-Latin scripts. It reports both individual letter counts and their relative frequency as a percentage of total letters, not total characters. Note: whitespace, digits, and punctuation are excluded from the letter count by design. Results assume case-insensitive grouping. Pro tip: paste a full paragraph to get statistically meaningful distributions. For English text, expect e near 12.7% and z near 0.07%.

letter counter character frequency text analysis letter frequency alphabet counter character count

Formulas

The relative frequency of each letter is computed as the ratio of its occurrence count to the total number of letters detected in the input string.

fi = count(letteri)nβˆ‘j=1 count(letterj) Γ— 100%

Where fi is the percentage frequency of letter i, count(letteri) is the number of occurrences of that letter (case-insensitive), and the denominator is the total number of Unicode letters in the input. Characters matching the Unicode property \p{L} are classified as letters. Digits, whitespace, and punctuation are excluded.

Reference Data

LetterExpected English FrequencyApprox. RankCommon Usage Note
E12.70%1Most common in nearly all English corpora
T9.06%2High in function words (the, that, to)
A8.17%3Common vowel in open syllables
O7.51%4Frequent in prepositions and conjunctions
I6.97%5Pronoun and suffix carrier
N6.75%6Common consonant in endings (-tion, -ing)
S6.33%7Plural marker and verb conjugation
H6.09%8Frequent in digraphs (th, ch, sh)
R5.99%9Common in clusters (tr, pr, str)
D4.25%10Past tense marker (-ed)
L4.03%11Frequent in articles and suffixes (-ly, -al)
C2.78%12Hard/soft duality (cat vs. city)
U2.76%13Always follows Q in English
M2.41%14Nasal consonant, common word-initial
W2.36%15Question words and function words
F2.23%16Labiodental fricative, common in prefixes
G2.02%17Present participle (-ing)
Y1.97%18Semi-vowel, adverb suffix (-ly)
P1.93%19Plosive, common in prefixes (pre-, pro-)
B1.49%20Voiced bilabial plosive
V0.98%21Never doubles in native English words
K0.77%22Often silent before N (knee, know)
J0.15%23Rare, mostly word-initial
X0.15%24Often represents /ks/ cluster
Q0.10%25Almost always paired with U
Z0.07%26Least common, more frequent in American spelling

Frequently Asked Questions

All letters are grouped case-insensitively. "A" and "a" both increment the count for the letter A. The results display the uppercase form as the label, but the count reflects both cases combined.
Yes. Accented characters are treated as distinct letters. "Γ‰" is counted separately from "E" because they are different Unicode code points. This matters for languages like French, Spanish, and German where accented characters carry semantic meaning.
Standard English frequency tables (like those from Robert Lewand or Herbert Zim) are derived from corpora of millions of words. Short text samples exhibit high variance. You typically need at least 1000 letters before frequency distributions begin to converge toward expected values. Technical or domain-specific text will also deviate due to specialized vocabulary.
Digits (0 - 9), whitespace (spaces, tabs, newlines), punctuation marks, symbols, and emoji are all excluded. Only characters matching the Unicode letter category \p{L} are counted. This includes Latin, Cyrillic, Greek, and other alphabetic scripts.
Frequency analysis is the foundational technique for breaking monoalphabetic substitution ciphers. By comparing the observed letter distribution of ciphertext against expected English frequencies (where E β‰ˆ 12.7% and T β‰ˆ 9.1%), you can map cipher letters to plaintext candidates. This tool provides the observed distribution. For polyalphabetic ciphers (VigenΓ¨re), you would need to segment the text by key length first.
Each bar width is proportional to the count of that letter relative to the maximum count found in your input. The letter with the highest count fills 100% of the bar width. All other bars scale linearly against that maximum. This provides a visual comparison of relative frequency within your specific text, not against the English standard.