About

Generating purely random bytes often results in invalid UTF-8 sequences, leading to replacement characters () or decoding errors. This tool bypasses byte-level manipulation by mathematically generating valid Unicode code points and relying on the browser's native engine to encode them correctly into UTF-8.

By defining explicit boundaries, we exclude surrogate halves (U+D800 to U+DFFF) and non-characters, ensuring that every generated string is strictly valid for database insertion, UI testing, or cryptographic seeding. The selection algorithm guarantees a uniform probability distribution across all active blocks by calculating the aggregate scalar size of the selected domains.

Formulas

To achieve a mathematically uniform distribution when selecting a random character from multiple non-contiguous ranges, the probability P of selecting any specific character c must be equal across the entire active pool. The probability is defined as:

P(c) = 1n∑i=1 |R_i|

Where n is the number of selected Unicode blocks, and |R_i| represents the cardinality (number of valid code points) of the i-th block. We generate a random scalar r such that 0 ≤ r < ∑|R_i|. We then map r to a specific code point by sequentially subtracting block sizes until the target sub-domain is isolated.

Reference Data

Unicode Block	Hexadecimal Range	Code Point Count	Examples
Basic Latin (Printable)	0x0020 - 0x007E	95	A, b, 1, @
Latin-1 Supplement	0x00A0 - 0x00FF	96	é, ñ, ß, ©
Cyrillic	0x0400 - 0x04FF	256	А, б, Д, з
Greek and Coptic	0x0370 - 0x03FF	144	α, β, Ω, ∑
Mathematical Operators	0x2200 - 0x22FF	256	∫, ∂, ∞, √
Braille Patterns	0x2800 - 0x28FF	256	⠁, ⠂, ⠃, ⠄
Miscellaneous Symbols	0x2600 - 0x26FF	256	☃, ☂, ☀, ☎
Emoticons (Emojis)	0x1F600 - 0x1F64F	80	😀, 😂, 😎
CJK Unified Ideographs	0x4E00 - 0x9FFF	20,992	中, 文, 字

Frequently Asked Questions

This phenomenon, known as "tofu", occurs when your operating system or browser lacks the necessary font files to render a specific Unicode character. The tool generates mathematically valid UTF-8 data, but visual representation requires local font support, especially for rare CJK ideographs or newly defined Emojis.

Yes. The tool utilizes the Web Crypto API (`crypto.getRandomValues`) rather than standard Math.random(). It also implements rejection sampling to eliminate modulo bias, making the output suitable for generating high-entropy passwords or cryptographic salts, provided the host environment is secure.

The generation algorithm guarantees a uniform probability for every individual character. Because the CJK Unified Ideographs block contains nearly 21,000 characters, while the Basic Latin block contains only 95, a randomly selected character is mathematically highly likely to fall within the CJK range based purely on volume.

No. The ranges are explicitly configured to bypass the surrogate blocks (U+D800 to U+DFFF). A lone surrogate is invalid in a UTF-8 stream and would cause encoding failures. Our generator only produces scalar values that can be safely encoded by the browser's String prototype.

Result

About

Formulas

Reference Data

Frequently Asked Questions