About

Typographical errors follow predictable biomechanical patterns. A finger striking k instead of l is not random. It is a consequence of QWERTY key adjacency and motor control variance. This tool models five distinct error classes - adjacent-key substitution, character transposition, deletion, duplication, and space corruption - each weighted by empirical frequency data from keystroke dynamics research. The typo rate r controls the probability that any given word is corrupted, where r = 0.05 means roughly 1 in 20 words will contain an error.

Applications include generating training data for spell-check algorithms, stress-testing OCR pipelines, creating realistic chat dialogue for fiction, and building proofreading exercises. The adjacency map covers all 47 alphanumeric keys on a standard US QWERTY layout. Note: this tool approximates human error. It does not model fatigue curves or per-user motor profiles. For languages with accented characters, results outside ASCII may be less physically accurate.

Formulas

Each word in the input text is independently evaluated for corruption. The probability of a typo occurring in a given word is controlled by the rate parameter:

P_typo = r

where r ∈ [0, 1] is the user-defined typo frequency. For each word, a uniform random value u is drawn from [0, 1). If u < r, a typo is applied.

apply_typo(word) =

{

mutate(word) if u < rword otherwise

The mutation function selects a typo type based on weighted random selection. For adjacent-key substitution, the QWERTY adjacency map A(k) returns the set of physically neighboring keys for key k. A replacement character is chosen uniformly from A(k). The expected number of typos in a text of n words is:

E[typos] = n × r

where n is word count and r is the rate. At r = 0.10, a 200-word paragraph yields approximately 20 typos.

Reference Data

Typo Type	Technical Name	Example	Cause	Real-World Frequency
Adjacent Key	Substitution Error	"hello" → "helko"	Finger drift to neighboring key	~38% of all typos
Transposition	Swap Error	"the" → "teh"	Timing mismatch between fingers	~20% of all typos
Omission	Deletion Error	"because" → "becuse"	Incomplete keystroke / speed	~16% of all typos
Insertion	Duplication Error	"book" → "boook"	Key bounce / sticky key	~12% of all typos
Space Deletion	Run-on Error	"my dog" → "mydog"	Thumb misses spacebar	~8% of all typos
Space Insertion	Split Error	"into" → "in to"	Premature space press	~4% of all typos
Capitalization	Case Error	"John" → "john"	Shift key timing	~2% of all typos
Key ↑ Row	Vertical Drift	"was" → "qas"	Hand position shifted up	Subset of substitution
Key ↓ Row	Vertical Drift	"red" → "rwd"	Hand position shifted down	Subset of substitution
Repeated Word	Cognitive Error	"the the cat"	Attention lapse	Context-dependent
Homophone	Lexical Error	"their" → "there"	Phonetic confusion	Context-dependent
Missing Double	Simplification	"success" → "sucess"	Uncertain spelling	Common in L2 writers

Frequently Asked Questions

Each key on a standard US QWERTY keyboard has between 2 and 6 physically adjacent neighbors. For example, the key f is adjacent to d, g, r, t, v, and c. When a substitution typo is triggered, one character in the word is replaced by a uniformly random neighbor from this set. Edge keys like q or p have fewer neighbors, making their substitution pool smaller and their errors more predictable.

Research on skilled typists shows an error rate of approximately 0.5% to 2% per word in casual typing. For untrained or fatigued typists, rates of 5% to 8% are common. A setting of r = 0.03 to 0.05 produces realistic results. Settings above 0.15 create deliberately garbled text suitable for stress-testing parsers.

Certain typo operations require minimum word length. Transposition needs at least 2 characters. Deletion on a 1-character word would eliminate it entirely, which is unrealistic. The algorithm enforces minimum length guards: substitution requires length ≥ 1, transposition ≥ 2, and deletion ≥ 3. This prevents generating artifacts that no human would produce.

Yes. The algorithm tokenizes input by whitespace boundaries. Punctuation attached to words (commas, periods, quotes) is stripped before mutation and reattached afterward. Line breaks, tabs, and multiple spaces are preserved in their original positions. Only alphabetic characters within word tokens are candidates for mutation.

Yes. The tool is designed for exactly this use case. Each typo type maps to a known error taxonomy used in computational linguistics: substitution, transposition, insertion, and deletion (the Damerau - Levenshtein operations). By enabling specific typo types individually, you can generate targeted training sets. For balanced datasets, enable all types and use a moderate rate of r ≈ 0.05.

Each enabled typo type has a weight derived from empirical frequency data: adjacent-key (38), transposition (20), omission (16), duplication (12), space errors (8), and case errors (6). The weights of enabled types are summed. A random number is drawn in [0, total_weight). The algorithm walks through the enabled types accumulating weight until the random value is exceeded. This ensures realistic distribution even when some types are disabled.