User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
0 characters
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Text fuzziness is the deliberate introduction of controlled noise into a string. Applications range from adversarial testing of NLP pipelines and spam filter evasion analysis to data augmentation for training robust OCR and text-classification models. A misapplied fuzziness level can render test data useless or produce artifacts that don't reflect real-world corruption patterns. This tool applies four distinct distortion algorithms - homoglyph substitution using Unicode confusables from Latin, Cyrillic, and Greek blocks, stochastic typo injection based on a QWERTY adjacency matrix, Zalgo diacritical stacking via combining characters in the range U+0300 - 036F, and weighted leetspeak mapping - each governed by a probability parameter p ∈ [0, 1]. Note: homoglyph output may appear identical to the original on certain fonts but differs at the codepoint level. Results depend on the rendering engine and typeface.

fuzzy text text distortion homoglyph generator zalgo text text obfuscation typo generator leetspeak converter

Formulas

Each character in the input string is independently subjected to a distortion decision. The probability of any character being altered is governed by the fuzziness parameter:

P(alter) = f Γ— wmode

where f ∈ [0, 1] is the fuzziness intensity from the slider, and wmode is the mode-specific weight (homoglyph: 0.9, typo: 0.4, Zalgo: 0.7, leet: 0.8). For each character ci, a uniform random value r ∈ [0, 1) is drawn:

{
distort(ci) if r < P(alter)ci otherwise

For Zalgo mode, the number of combining marks n stacked on each affected character scales linearly with intensity:

nmarks = floor(f Γ— 15) + 1

where f at maximum yields up to 16 combining characters per base glyph. For typo mode, the error type is selected uniformly from the set {swap, duplicate, omit, neighbor}, with neighbor-key substitution using a precomputed QWERTY adjacency lookup of 26 Γ— ~4.5 average neighbors per key.

Reference Data

ModeTechniqueUnicode Range / MechanismDetectabilityUse Case
HomoglyphVisual lookalike substitutionCyrillic (U+0400 - U+04FF), Greek (U+0370 - U+03FF)Low (visually identical)Phishing research, filter bypass testing
TypoAdjacent-key swap, duplication, omissionQWERTY keyboard adjacency mapMedium (human-readable errors)NLP robustness testing, data augmentation
ZalgoCombining diacritical marks stackingU+0300 - U+036F (Combining Diacriticals)High (visually chaotic)Artistic text, stress testing renderers
LeetspeakAlpha β†’ numeric/symbol replacementASCII substitution dictionaryMedium (recognizable pattern)Gaming culture, basic obfuscation
MixedWeighted blend of all four modesAll of the aboveVariableComprehensive fuzzing
Common Homoglyph Pairs (Latin β†’ Cyrillic/Greek)
aΠ° (Cyrillic Small A)U+0430Visually identicalConfusable substitution
eΠ΅ (Cyrillic Small Ie)U+0435Visually identicalConfusable substitution
oΞΏ (Greek Small Omicron)U+03BFVisually identicalConfusable substitution
pΡ€ (Cyrillic Small Er)U+0440Visually identicalConfusable substitution
cс (Cyrillic Small Es)U+0441Visually identicalConfusable substitution
xΡ… (Cyrillic Small Kha)U+0445Visually identicalConfusable substitution
yу (Cyrillic Small U)U+0443Visually identicalConfusable substitution
sΡ• (Cyrillic Small Dze)U+0455Visually identicalConfusable substitution
iΡ– (Cyrillic Small Byelorussian-Ukrainian I)U+0456Visually identicalConfusable substitution
HН (Cyrillic Capital En)U+041DVisually identicalConfusable substitution
TΠ’ (Cyrillic Capital Te)U+0422Visually identicalConfusable substitution
BΠ’ (Cyrillic Capital Ve)U+0412Visually identicalConfusable substitution
Combining Diacritical Marks (Zalgo)
AboveU+0300 - U+0315Grave, Acute, Circumflex, Tilde, etc.22 marksStack above glyphs
BelowU+0316 - U+0333Grave below, Cedilla, etc.30 marksStack below glyphs
OverlayU+0334 - U+0338Tilde overlay, Stroke, etc.5 marksStrike-through effect

Frequently Asked Questions

Homoglyph substitution replaces ASCII characters (1 byte in UTF-8) with visually identical Unicode codepoints from Cyrillic or Greek blocks, which encode as 2-3 bytes in UTF-8. The string appears unchanged to the human eye, but string comparison, hashing, and regex matching will all fail. This is the core mechanism behind internationalized domain name (IDN) homograph attacks studied in security research.
At intensity f = 0.5, approximately 50% of eligible characters are distorted (scaled by the mode weight w). For a 1000-character corpus, expect roughly 350-450 altered characters in homoglyph mode (w = 0.9) versus ~200 in typo mode (w = 0.4). For statistically significant augmentation, generate multiple variants at different intensity levels and compare model accuracy degradation curves.
Yes. At maximum intensity, each character can carry up to 16 combining diacritical marks, expanding vertical glyph bounds dramatically. Some older browsers and terminal emulators may clip, overlap, or crash when rendering deeply stacked combiners. Modern browsers handle it gracefully but line-height calculations become unreliable. Avoid pasting high-intensity Zalgo into fixed-height UI elements in production.
The typo algorithm uses a QWERTY physical adjacency map, so substituted characters are spatially near the original key. This produces errors consistent with motor-control slip patterns observed in human typing studies. However, it does not model phonetic confusion (e.g., 'their/there') or autocorrect artifacts. For phonetic error simulation, a phoneme-mapping layer would be required, which this tool does not implement.
By default, the tool uses Math.random, which is non-deterministic. Each "Fuzzify" click produces a different output even with identical input and settings. If you need reproducibility for test suites, re-run the tool multiple times and select the variant that best represents your target error distribution. The state (input text, mode, intensity) is persisted to LocalStorage for session continuity.
Whitespace characters (space, tab, newline), digits in non-leet modes, and punctuation marks without known homoglyphs are preserved. In homoglyph mode, only characters with entries in the confusables dictionary are eligible for substitution. Characters outside the Basic Latin block (U+0000 - U+007F) pass through unaltered in all modes to avoid double-encoding.