User Rating 0.0
Total Usage 0 times
Category Security
Is this tool helpful?

Your feedback helps us improve.

About

A monoalphabetic substitution cipher replaces each letter in the plaintext with a fixed corresponding letter from a shuffled alphabet. The keyspace is 26! 4.03 × 1026 permutations, which sounds secure until you apply frequency analysis. English text has a non-uniform letter distribution: E appears roughly 12.7% of the time, while Z appears only 0.07%. This statistical fingerprint survives encryption. Given sufficient ciphertext (typically > 100 characters), the frequency profile of the cipher alphabet can be mapped against standard English frequencies to recover the original substitution key. This tool performs that analysis automatically and lets you refine the mapping interactively. Note: accuracy degrades on short texts or texts with unusual vocabulary. The tool assumes standard English prose and does not handle polyalphabetic ciphers (Vigenère) or homophonic variants.

substitution cipher cipher decoder frequency analysis cryptanalysis cipher breaker monoalphabetic cipher decrypt

Formulas

The frequency of each cipher letter is calculated as:

fc = count(c)N × 100

where c is a cipher letter and N is the total number of alphabetic characters in the ciphertext.

The auto-solve heuristic sorts cipher letters by descending frequency and maps them to the standard English frequency ranking: E, T, A, O, I, N, S, H, R, D, L, C, U, M, W, F, G, Y, P, B, V, K, J, X, Q, Z. This is a greedy initial approximation. The chi-squared statistic used to measure fit is:

χ2 = 26i=1 (Oi Ei)2Ei

where Oi is the observed count of the i-th letter and Ei is the expected count based on English frequencies. Lower χ2 indicates a better mapping.

Reference Data

LetterEnglish Frequency (%)RankCommon As
E12.701Most common letter
T9.062Frequent in THE, THAT
A8.173Only single-letter word besides I
O7.514Common in OF, ON, OR
I6.975Single-letter word
N6.756Common ending ( - TION, -ING)
S6.337Plural marker, common start
H6.098TH is most common digraph
R5.999Common in ER, RE, AR
D4.2510Past tense marker -ED
L4.0311Common double (LL)
C2.7812Often before H, K
U2.7613Almost always follows Q
M2.4114Common start
W2.3615Common start (WH - )
F2.2316FOR, FROM, IF
G2.0217 - ING ending
Y1.9718Common ending ( - LY)
P1.9319Often paired (PP)
B1.2920Common start (BE, BUT)
V0.9821Never doubled in English
K0.7722Often after C
J0.1523Rare
X0.1524Rare, often EX -
Q0.1025Nearly always QU
Z0.0726Rarest letter

Frequently Asked Questions

Frequency analysis becomes statistically meaningful above approximately 100-200 alphabetic characters. Below that threshold, the observed frequencies deviate significantly from the expected English distribution, and the auto-solve heuristic may produce incorrect mappings. For texts under 50 characters, manual pattern analysis (single-letter words, common digraphs like TH, repeated patterns) is more effective than pure frequency matching.
The auto-solve uses a greedy frequency-rank mapping. This assumes the ciphertext follows standard English letter distribution perfectly. Specialized vocabulary (technical papers, poetry, proper nouns) can skew frequencies. For example, a text about "jazz" will have abnormally high Z frequency. The auto-solve is a starting point. You should refine it manually by looking at word patterns, common short words (THE, AND, IS, OF), and double letters.
No. This tool is designed exclusively for monoalphabetic substitution ciphers where each plaintext letter maps to exactly one cipher letter consistently throughout the message. A Vigenère cipher uses multiple substitution alphabets cyclically, which flattens the frequency distribution. Breaking Vigenère requires first determining the key length (via Kasiski examination or index of coincidence) and then solving each sub-cipher independently.
Single-letter words are almost certainly A or I. The most common three-letter word is THE, and its identification immediately reveals three letters. Double letters are constrained: common doubles are LL, SS, EE, OO, TT, FF, RR, NN, PP, CC. The letter Q is nearly always followed by U. The digraph TH is the most common in English (approximately 3.56% of all digraphs). The trigraph THE accounts for roughly 1.81% of all trigraphs. Apostrophe patterns like _'T suggest N'T or 'T from contractions.
A valid substitution cipher requires a bijective (one-to-one) mapping: each cipher letter maps to exactly one plain letter, and each plain letter is the target of at most one cipher letter. If you assign plain letter E to cipher letter X, and then try to also assign E to cipher letter Y, the tool flags this as a conflict. You must resolve the conflict by removing one of the assignments before proceeding. The conflict indicator turns the affected cells red.
The frequency analysis counts only alphabetic characters (A - Z). Spaces, digits, punctuation, and special characters are preserved in the display for readability but excluded from the frequency count. The total character count shown (N) reflects only letters. This means the tool correctly handles ciphertexts that preserve original spacing and punctuation, which is the most common format for educational substitution cipher exercises.