About

A monoalphabetic substitution cipher replaces each letter in the plaintext with a fixed corresponding letter from a shuffled alphabet. The keyspace is 26! ≈ 4.03 × 10²⁶ permutations, which sounds secure until you apply frequency analysis. English text has a non-uniform letter distribution: E appears roughly 12.7% of the time, while Z appears only 0.07%. This statistical fingerprint survives encryption. Given sufficient ciphertext (typically > 100 characters), the frequency profile of the cipher alphabet can be mapped against standard English frequencies to recover the original substitution key. This tool performs that analysis automatically and lets you refine the mapping interactively. Note: accuracy degrades on short texts or texts with unusual vocabulary. The tool assumes standard English prose and does not handle polyalphabetic ciphers (Vigenère) or homophonic variants.

Formulas

The frequency of each cipher letter is calculated as:

f_c = count(c)N × 100

where c is a cipher letter and N is the total number of alphabetic characters in the ciphertext.

The auto-solve heuristic sorts cipher letters by descending frequency and maps them to the standard English frequency ranking: E, T, A, O, I, N, S, H, R, D, L, C, U, M, W, F, G, Y, P, B, V, K, J, X, Q, Z. This is a greedy initial approximation. The chi-squared statistic used to measure fit is:

χ² = 26∑i=1 (O_i − E_i)²E_i

where O_i is the observed count of the i-th letter and E_i is the expected count based on English frequencies. Lower χ² indicates a better mapping.

Reference Data

Letter	English Frequency (%)	Rank	Common As
E	12.70	1	Most common letter
T	9.06	2	Frequent in THE, THAT
A	8.17	3	Only single-letter word besides I
O	7.51	4	Common in OF, ON, OR
I	6.97	5	Single-letter word
N	6.75	6	Common ending ( - TION, -ING)
S	6.33	7	Plural marker, common start
H	6.09	8	TH is most common digraph
R	5.99	9	Common in ER, RE, AR
D	4.25	10	Past tense marker -ED
L	4.03	11	Common double (LL)
C	2.78	12	Often before H, K
U	2.76	13	Almost always follows Q
M	2.41	14	Common start
W	2.36	15	Common start (WH - )
F	2.23	16	FOR, FROM, IF
G	2.02	17	- ING ending
Y	1.97	18	Common ending ( - LY)
P	1.93	19	Often paired (PP)
B	1.29	20	Common start (BE, BUT)
V	0.98	21	Never doubled in English
K	0.77	22	Often after C
J	0.15	23	Rare
X	0.15	24	Rare, often EX -
Q	0.10	25	Nearly always QU
Z	0.07	26	Rarest letter

Frequently Asked Questions

Frequency analysis becomes statistically meaningful above approximately 100-200 alphabetic characters. Below that threshold, the observed frequencies deviate significantly from the expected English distribution, and the auto-solve heuristic may produce incorrect mappings. For texts under 50 characters, manual pattern analysis (single-letter words, common digraphs like TH, repeated patterns) is more effective than pure frequency matching.

The auto-solve uses a greedy frequency-rank mapping. This assumes the ciphertext follows standard English letter distribution perfectly. Specialized vocabulary (technical papers, poetry, proper nouns) can skew frequencies. For example, a text about "jazz" will have abnormally high Z frequency. The auto-solve is a starting point. You should refine it manually by looking at word patterns, common short words (THE, AND, IS, OF), and double letters.

No. This tool is designed exclusively for monoalphabetic substitution ciphers where each plaintext letter maps to exactly one cipher letter consistently throughout the message. A Vigenère cipher uses multiple substitution alphabets cyclically, which flattens the frequency distribution. Breaking Vigenère requires first determining the key length (via Kasiski examination or index of coincidence) and then solving each sub-cipher independently.

Single-letter words are almost certainly A or I. The most common three-letter word is THE, and its identification immediately reveals three letters. Double letters are constrained: common doubles are LL, SS, EE, OO, TT, FF, RR, NN, PP, CC. The letter Q is nearly always followed by U. The digraph TH is the most common in English (approximately 3.56% of all digraphs). The trigraph THE accounts for roughly 1.81% of all trigraphs. Apostrophe patterns like _'T suggest N'T or 'T from contractions.

A valid substitution cipher requires a bijective (one-to-one) mapping: each cipher letter maps to exactly one plain letter, and each plain letter is the target of at most one cipher letter. If you assign plain letter E to cipher letter X, and then try to also assign E to cipher letter Y, the tool flags this as a conflict. You must resolve the conflict by removing one of the assignments before proceeding. The conflict indicator turns the affected cells red.

The frequency analysis counts only alphabetic characters (A - Z). Spaces, digits, punctuation, and special characters are preserved in the display for readability but excluded from the frequency count. The total character count shown (N) reflects only letters. This means the tool correctly handles ciphertexts that preserve original spacing and punctuation, which is the most common format for educational substitution cipher exercises.