About

Zero-width characters are Unicode code points that occupy no visible space in rendered text. They exist in the Unicode standard for legitimate typographic purposes - U+200B (Zero-Width Space) controls line-break opportunities, U+200C (Zero-Width Non-Joiner) prevents ligature formation in scripts like Arabic and Devanagari, and U+200D (Zero-Width Joiner) forces ligatures or combines emoji sequences. This tool exploits these characters for text steganography: encoding arbitrary plaintext into a sequence of invisible code points that can be embedded within normal visible text. The encoding maps each character's binary representation to a pair of zero-width characters (U+200D → 1, U+200C → 0), separated by U+200B delimiters. The result is a string that appears completely empty to the human eye but carries a full payload when decoded.

A failure to detect zero-width characters in user-submitted content creates real security risks. Invisible strings can bypass naive input validation, smuggle data through chat filters, fingerprint leaked documents by embedding unique invisible watermarks, or break string comparison logic in code. Copy-pasting text from untrusted sources without stripping zero-width characters has caused bugs in production systems. This tool provides three operations: encoding visible text into invisible characters, decoding invisible sequences back to readable text, and detecting the presence and quantity of hidden zero-width characters in any pasted content. The encoding is lossless for all BMP characters (code points ≤ 65535). Note: some platforms strip zero-width characters on paste - test your target platform before relying on this for message delivery.

Formulas

The encoding algorithm converts each character to its binary representation, then maps each bit to a zero-width character:

encode(c) = map(bin(charCodeAt(c)), bit → bit = 1 ? U+200D : U+200C)

For a full message of n characters, the output is the concatenation of each encoded character separated by the delimiter U+200B:

output = encode(c₁) + U+200B + encode(c₂) + U+200B + ... + encode(c_n)

The output length in code points for a single character with code point value v is:

L(v) = floor(log₂(v)) + 1

Total invisible string length for n characters:

T = n∑i=1 L(v_i) + (n − 1)

Where the + (n − 1) accounts for delimiter characters between encoded character groups.

c = input character, v = Unicode code point value, n = number of characters, L = bit-length of a code point, T = total invisible code points generated.

Reference Data

Character	Unicode	Name	Width	Purpose	Used In This Tool
	U+200B	Zero-Width Space	0px	Line-break opportunity	Character delimiter
‌	U+200C	Zero-Width Non-Joiner	0px	Prevent ligatures	Binary 0
‍	U+200D	Zero-Width Joiner	0px	Force ligatures / emoji glue	Binary 1
⁠	U+2060	Word Joiner	0px	Prevent line-break	Detection only
‎	U+200E	Left-to-Right Mark	0px	BiDi control	Detection only
‏	U+200F	Right-to-Left Mark	0px	BiDi control	Detection only
‪	U+202A	LR Embedding	0px	BiDi embedding	Detection only
‫	U+202B	RL Embedding	0px	BiDi embedding	Detection only
‬	U+202C	Pop Directional Formatting	0px	BiDi terminator	Detection only
	U+FEFF	BOM / Zero-Width No-Break Space	0px	Byte order mark	Detection only
⁣	U+2063	Invisible Separator	0px	Math separator	Detection only
⁢	U+2062	Invisible Times	0px	Implied multiplication	Detection only
⁡	U+2061	Function Application	0px	Math notation	Detection only
᠎	U+180E	Mongolian Vowel Separator	0px	Mongolian script	Detection only
ㅤ	U+3164	Hangul Filler	Variable	Korean filler	Detection only
ᅟ	U+115F	Hangul Choseong Filler	Variable	Korean initial filler	Detection only

Frequently Asked Questions

Most platforms preserve zero-width characters: Gmail, Outlook, Slack, Discord, WhatsApp, Telegram, and standard text editors. Twitter/X strips some zero-width characters from tweets. Facebook may strip them in certain contexts. Google Docs preserves them. Always test by encoding a short message, pasting it into your target platform, copying it back out, and running it through the Decode tab to verify integrity.

The tool encodes characters in the Basic Multilingual Plane (code points U+0000 to U+FFFF), which covers virtually all modern languages, punctuation, and symbols. Each character produces 7-16 invisible code points depending on its value. A 1,000-character message produces roughly 12,000-17,000 zero-width characters. The tool limits input to 10,000 characters to prevent browser performance issues, yielding up to ~170,000 invisible code points. Most text fields have no issue storing this volume.

Yes. Any system performing Unicode category analysis can detect zero-width characters. The regex pattern [\u200B-\u200F\u2028-\u202F\u2060\uFEFF] catches the most common invisible characters. Enterprise DLP (Data Loss Prevention) tools increasingly flag zero-width character sequences as potential steganographic payloads. This tool's Detect mode performs exactly this analysis. Do not rely on zero-width encoding for security-critical secrecy - it is obfuscation, not encryption.

Base64 converts binary data to visible ASCII characters (A-Z, a-z, 0-9, +, /). The output is clearly visible and recognizable as encoded text. Zero-width character encoding converts data to Unicode characters that render with zero pixel width - the output is literally invisible when embedded in normal text. The trade-off is efficiency: Base64 expands data by ~33%, while zero-width encoding expands each character to 7-16 invisible code points. Zero-width encoding is for steganography (hiding the existence of a message), not efficient data transport.

Three common causes: (1) The target application strips or normalizes zero-width characters on paste. (2) The operating system clipboard performs Unicode normalization (NFC/NFD), which can affect surrounding text but typically preserves zero-width characters. (3) Rich text editors may interpret U+200D as an emoji joiner and combine adjacent characters unexpectedly. For reliable transfer, use plain-text paste (Ctrl+Shift+V) and avoid rich text editors. The Detect tab will show you exactly which zero-width characters survived the transfer.

Absolutely. Two strings that appear visually identical can fail strict equality checks if one contains zero-width characters. The string "hello" and "h\u200Bello" render identically but "hello" === "h\u200Bello" returns false. This has caused production bugs in authentication systems, database lookups, and URL routing. Always sanitize user input by stripping zero-width characters with: str.replace(/[\u200B-\u200F\u2028-\u202F\u2060\uFEFF]/g, '') before comparison or storage.