Random UTF-32 Code Point Generator
Generate valid random UTF-32 encoded Unicode code points. Filter by planes, exclude surrogates, and format as raw text, hex, or byte sequences.
About
Testing text rendering engines, database storage constraints, or parsing algorithms requires robust data inputs. Standard ASCII or localized text often fails to trigger edge cases associated with multi-byte encoding or unassigned Unicode planes. This generator synthesizes random valid Unicode code points represented in UTF-32 (4-byte fixed length).
Unlike variable-length encodings such as UTF-8 or UTF-16, UTF-32 represents every character using exactly 32 bits. This tool mathematically guarantees the exclusion of the UTF-16 surrogate halves range (0xD800 − 0xDFFF), ensuring all generated values represent valid, standalone characters applicable across the full 0x000000 to 0x10FFFF spectrum.
Formulas
When selecting the full valid Unicode range, the algorithm must skip the surrogate gap. Generating a raw random number between 0 and 1114111 (0x10FFFF) introduces the risk of hitting an invalid surrogate. Instead, we calculate the total valid space:
Where Max is 1114112 and Scount (surrogates) is 2048. We generate a random integer R such that 0 ≤ R < V. To map R back to a valid code point C:
This ensures uniform distribution across all valid Unicode characters without requiring recalculation or looping upon hitting an invalid block.
Reference Data
| Plane | Name | Range (Hex) | Primary Content |
|---|---|---|---|
| 0 | Basic Multilingual Plane (BMP) | 0000 − FFFF | Common modern languages, symbols |
| 1 | Supplementary Multilingual (SMP) | 10000 − 1FFFF | Historic scripts, musical symbols, emojis |
| 2 | Supplementary Ideographic (SIP) | 20000 − 2FFFF | Rare and historic CJK ideographs |
| 3 | Tertiary Ideographic (TIP) | 30000 − 3FFFF | Additional ancient CJK ideographs |
| 4-13 | Unassigned | 40000 − DFFFF | Reserved for future use |
| 14 | Supplementary Special (SSP) | E0000 − EFFFF | Format control characters |
| 15 | Private Use Area (PUA-A) | F0000 − FFFFF | Custom corporate/user defined |
| 16 | Private Use Area (PUA-B) | 100000 − 10FFFF | Custom corporate/user defined |