About

Legacy systems, older databases, and email headers often rely on character encodings that predate UTF-8 universality. Windows-1251 remains a critical code page for maintaining and debugging software originally designed for Cyrillic scripts (Russian, Bulgarian, Serbian). When these bytes are misinterpreted as ISO-8859-1 or UTF-8, text renders as unreadable garbage (mojibake).

This reference tool maps the full 8-bit byte range (0-255) specifically for the Windows-1251 layout. It covers the standard ASCII control and printable characters (0-127) and the upper extensions (128-255) containing Cyrillic glyphs and special punctuation. Precision in these values is mandatory for binary file analysis, data recovery, and fixing encoding declaration errors in HTML headers.

Formulas

Character encoding translates a numerical value (code point) into a graphical symbol. In single-byte encodings like ASCII and Windows-1251, one character equals exactly one byte (8 bits). The conversion from a Decimal byte value to Hexadecimal aids in debugging binary dumps.

For a byte value n, the Hexadecimal representation is calculated by dividing by the base 16:

n = d₁ × 16¹ + d₀ × 16⁰

Where d represents a digit from {0, 1, ..., 9, A, B, C, D, E, F}. For the Cyrillic letter "Я" (Ya):

Decimal: 223 ≡ 13 × 16 + 15

Hex: 13 → D, 15 → F &implies; 0xDF

Reference Data

Range	Description	Byte (Dec)	Byte (Hex)	Usage Context
Control Characters	Non-printable instructions	0-31	00-1F	Terminals, Printers, Data Stream Control (NULL, LF, CR)
Standard ASCII	Latin Alphabet, Numbers, Symbols	32-127	20-7F	Universal compatibility (English text, Code syntax)
Windows-1251 Upper	Extended Punctuation	128-191	80-BF	Includes Ђ, Љ, currency symbols, and typographic quotes often missing in ISO-8859-5
Windows-1251 Cyrillic	Russian/Cyrillic Alphabet	192-255	C0-FF	Primary range for upper and lowercase Cyrillic letters

Frequently Asked Questions

This phenomenon is known as Mojibake. It occurs when text encoded in UTF-8 (2 bytes per Cyrillic character) is decoded as Windows-1251 (1 byte per character). The system interprets the two UTF-8 bytes as two separate Windows-1251 characters, usually from the upper Latin extended range (192-255).

Not entirely. While both cover Cyrillic, the mapping of specific characters to byte values differs. Windows-1251 is more popular on the web and in legacy Windows applications, whereas ISO-8859-5 was an early standard that saw less adoption. Mixing them results in incorrect symbols.

Code 10 is Line Feed (LF, \n) and Code 13 is Carriage Return (CR, \r). Unix/Linux systems use LF for line breaks, whereas legacy Windows systems typically use the pair CR+LF. Mismatches cause text to appear as one long line or with extra blank lines.

For characters reserved in HTML (like < or >) or those outside the document's encoding, use the entity reference. For example, to display the Cyrillic "Ж" safely in an ASCII-only file, you can use the numeric reference Ж (Decimal) or Ж (Hex).