Random UTF-16 String Generator
Generate high-entropy, cryptographically secure random UTF-16 strings across multiple Unicode planes. Ideal for fuzz testing, cryptography, and development.
Generated Output
About
Generating valid UTF-16 strings is a critical requirement for rigorous software fuzz testing, database constraint validation, and rendering engine stress tests. Improperly handling character encoding often leads to catastrophic failures, such as memory overflows, database corruption, or UI rendering crashes (often resulting in mojibake). This tool algorithmically generates valid Unicode scalar values across specified planes.
Unlike rudimentary ASCII generators, this engine maps random values strictly to valid Unicode code points ranging from U+0000 to U+10FFFF. It strictly isolates and prevents the independent generation of the surrogate halves (0xD800 โ 0xDFFF), ensuring that every generated character constitutes a legal UTF-16 sequence. This prevents your fuzz tests from generating isolated high or low surrogates which cause unrecoverable parser errors in strict environments.
Formulas
UTF-16 encodes characters across the Basic Multilingual Plane (BMP) as single 16-bit code units. However, characters in supplementary planes (ranging from U+10000 to U+10FFFF) must be mathematically transformed into a "Surrogate Pair" consisting of a High Surrogate (W1) and a Low Surrogate (W2). The generator internally applies the following transformation to ensure legal encoding parameters:
Let U be the Unicode scalar value where U โฅ 0x10000.
Uโฒ = U โ 0x10000
This deterministic transformation guarantees that W1 resides strictly within 0xD800 - 0xDBFF and W2 within 0xDC00 - 0xDFFF. Any random sequence engine must exclude these precise ranges from raw scalar generation to avoid invalid character definitions.
Reference Data
| Unicode Block | Start Code (Hex) | End Code (Hex) | Plane | Notes |
|---|---|---|---|---|
| Basic Latin | 0020 | 007F | BMP (0) | Standard ASCII characters. Control characters excluded for safety. |
| Latin-1 Supplement | 00A0 | 00FF | BMP (0) | European accented characters and symbols. |
| Cyrillic | 0400 | 04FF | BMP (0) | Standard Russian, Ukrainian, and Slavic characters. |
| Arabic | 0600 | 06FF | BMP (0) | Right-to-left rendering test targets. |
| Devanagari | 0900 | 097F | BMP (0) | Complex script rendering and ligature tests. |
| CJK Unified Ideographs | 4E00 | 9FFF | BMP (0) | Extensive memory/storage stress testing (high density). |
| Emoticons | 1F600 | 1F64F | SMP (1) | Requires surrogate pairs. Excellent for testing 4-byte boundaries. |
| Miscellaneous Symbols | 2600 | 26FF | BMP (0) | Weather, astrological, and generic symbols. |
| Mathematical Alphanumeric | 1D400 | 1D7FF | SMP (1) | Astral plane characters. Uses surrogate pairs. |