Break Substitution Cipher
Decode monoalphabetic substitution ciphers using frequency analysis, pattern matching, and interactive letter mapping. Cryptanalysis tool.
About
A monoalphabetic substitution cipher replaces each letter in the plaintext with a fixed corresponding letter from a shuffled alphabet. The keyspace is 26! ≈ 4.03 × 1026 permutations, which sounds secure until you apply frequency analysis. English text has a non-uniform letter distribution: E appears roughly 12.7% of the time, while Z appears only 0.07%. This statistical fingerprint survives encryption. Given sufficient ciphertext (typically > 100 characters), the frequency profile of the cipher alphabet can be mapped against standard English frequencies to recover the original substitution key. This tool performs that analysis automatically and lets you refine the mapping interactively. Note: accuracy degrades on short texts or texts with unusual vocabulary. The tool assumes standard English prose and does not handle polyalphabetic ciphers (Vigenère) or homophonic variants.
Formulas
The frequency of each cipher letter is calculated as:
where c is a cipher letter and N is the total number of alphabetic characters in the ciphertext.
The auto-solve heuristic sorts cipher letters by descending frequency and maps them to the standard English frequency ranking: E, T, A, O, I, N, S, H, R, D, L, C, U, M, W, F, G, Y, P, B, V, K, J, X, Q, Z. This is a greedy initial approximation. The chi-squared statistic used to measure fit is:
where Oi is the observed count of the i-th letter and Ei is the expected count based on English frequencies. Lower χ2 indicates a better mapping.
Reference Data
| Letter | English Frequency (%) | Rank | Common As |
|---|---|---|---|
| E | 12.70 | 1 | Most common letter |
| T | 9.06 | 2 | Frequent in THE, THAT |
| A | 8.17 | 3 | Only single-letter word besides I |
| O | 7.51 | 4 | Common in OF, ON, OR |
| I | 6.97 | 5 | Single-letter word |
| N | 6.75 | 6 | Common ending ( - TION, -ING) |
| S | 6.33 | 7 | Plural marker, common start |
| H | 6.09 | 8 | TH is most common digraph |
| R | 5.99 | 9 | Common in ER, RE, AR |
| D | 4.25 | 10 | Past tense marker -ED |
| L | 4.03 | 11 | Common double (LL) |
| C | 2.78 | 12 | Often before H, K |
| U | 2.76 | 13 | Almost always follows Q |
| M | 2.41 | 14 | Common start |
| W | 2.36 | 15 | Common start (WH - ) |
| F | 2.23 | 16 | FOR, FROM, IF |
| G | 2.02 | 17 | - ING ending |
| Y | 1.97 | 18 | Common ending ( - LY) |
| P | 1.93 | 19 | Often paired (PP) |
| B | 1.29 | 20 | Common start (BE, BUT) |
| V | 0.98 | 21 | Never doubled in English |
| K | 0.77 | 22 | Often after C |
| J | 0.15 | 23 | Rare |
| X | 0.15 | 24 | Rare, often EX - |
| Q | 0.10 | 25 | Nearly always QU |
| Z | 0.07 | 26 | Rarest letter |