About

Windows-1251 (CP1251) is a single-byte character encoding for Cyrillic scripts, standardized by Microsoft in 1990 for use in Windows operating systems across Russian, Ukrainian, Bulgarian, and Serbian locales. Each byte in the range 0x80 - 0xFF maps to a specific Unicode code point, covering 128 additional characters beyond ASCII. Misinterpreting these bytes as UTF-8 produces the classic "mojibake" - garbled sequences like "Ð Ð¾ÑÑÐ¸Ñ" instead of readable Cyrillic. This tool performs real byte-level re-encoding using the browser's native TextDecoder API, not a lookup simulation. It accepts raw hex byte sequences, URL-encoded strings (common in legacy HTTP forms), or binary file uploads.

Legacy databases, email archives (MIME headers often declare charset=windows-1251), and CSV exports from Russian-locale software remain common sources of CP1251-encoded data. Incorrect conversion corrupts not just display but data integrity: sorting, search indexing, and regex matching all fail on mojibake. This converter preserves every code point. Note: bytes 0x98 and 0x88 are undefined in CP1251 and will produce replacement characters (U+FFFD).

Formulas

Windows-1251 is a single-byte encoding. Each input byte b maps to exactly one Unicode code point U via a fixed lookup table. The conversion function is:

U = CP1251_TABLE(b), where b ∈ [0x00, 0xFF]

For bytes in the ASCII range:

{

U = b if 0x00 ≤ b ≤ 0x7FU = TABLE[b − 0x80] if 0x80 ≤ b ≤ 0xFF

The resulting Unicode code point U is then encoded as UTF-8 using the standard multi-byte scheme:

{

1 byte: 0xxxxxxx if U ≤ 0x7F2 bytes: 110xxxxx 10xxxxxx if U ≤ 0x7FF3 bytes: 1110xxxx 10xxxxxx 10xxxxxx if U ≤ 0xFFFF

Where b = input byte in Windows-1251 encoding, U = resulting Unicode code point, CP1251_TABLE = the 128-entry mapping for bytes 0x80 - 0xFF. Cyrillic block А - я occupies a contiguous range: U = b − 0xC0 + 0x0410 for capital letters and U = b − 0xE0 + 0x0430 for lowercase.

Reference Data

Byte Range (Hex)	CP1251 Characters	Unicode Range	Description
0x00 - 0x7F	Standard ASCII	U+0000 - U+007F	Identical to ASCII / UTF-8 single-byte
0x80	Ђ	U+0402	Cyrillic capital letter DJE (Serbian)
0x81	Ѓ	U+0403	Cyrillic capital letter GJE (Macedonian)
0x82	‚	U+201A	Single low-9 quotation mark
0x83	ѓ	U+0453	Cyrillic small letter GJE
0x84	"	U+201E	Double low-9 quotation mark
0x85	…	U+2026	Horizontal ellipsis
0x86	†	U+2020	Dagger
0x88	€	U+20AC	Euro sign
0x8A	Љ	U+0409	Cyrillic capital letter LJE
0x8D	Ќ	U+040C	Cyrillic capital letter KJE
0x8E	Ћ	U+040B	Cyrillic capital letter TSHE
0x8F	Џ	U+040F	Cyrillic capital letter DZHE
0x90	ђ	U+0452	Cyrillic small letter DJE
0xA0		U+00A0	Non-breaking space
0xA8	Ё	U+0401	Cyrillic capital letter IO
0xB8	ё	U+0451	Cyrillic small letter IO
0xC0 - 0xDF	А - Я	U+0410 - U+042F	Cyrillic capital letters A through YA
0xE0 - 0xFF	а - я	U+0430 - U+044F	Cyrillic small letters a through ya
0xB0	°	U+00B0	Degree sign
0xAB	«	U+00AB	Left double angle quotation mark
0xBB	»	U+00BB	Right double angle quotation mark
0x96	-	U+2013	En dash
0x97	-	U+2014	Em dash
0xAA	Є	U+0404	Cyrillic capital letter Ukrainian IE
0xBA	є	U+0454	Cyrillic small letter Ukrainian IE
0xAF	Ї	U+0407	Cyrillic capital letter YI (Ukrainian)
0xBF	ї	U+0457	Cyrillic small letter YI (Ukrainian)
0xB2	І	U+0406	Cyrillic capital letter Byelorussian-Ukrainian I
0xB3	і	U+0456	Cyrillic small letter Byelorussian-Ukrainian I
0xA5	Ґ	U+0490	Cyrillic capital letter GHE with upturn (Ukrainian)

Frequently Asked Questions

Bytes 0x98 is technically undefined in the original Windows-1251 specification. The browser's native TextDecoder maps it to U+FFFD (Unicode Replacement Character). This tool preserves that behavior rather than silently dropping bytes, ensuring you can audit the output for data corruption.

All three encode Cyrillic scripts in a single byte, but they use completely different byte-to-character mappings. For example, the byte 0xC0 is "А" (A) in CP1251, "Р" (R) in ISO-8859-5, and "ю" (yu) in KOI8-R. Applying the wrong decoding table produces mojibake. This tool only handles Windows-1251. Using it on KOI8-R data will produce incorrect output.

The tool accepts files up to 50 MB. Since Windows-1251 is a single-byte encoding, the conversion is a linear O(n) operation and completes in milliseconds even for large files. The browser's TextDecoder processes the entire ArrayBuffer in a single call. For files beyond 50 MB, consider command-line tools like iconv.

URL encoding uses percent-encoded bytes like %C0%E5. If your source system used a different encoding (e.g., UTF-8) for the URL encoding, the bytes represent UTF-8 sequences, not CP1251. Feeding UTF-8 bytes through a CP1251 decoder will produce incorrect results. Verify the source encoding before converting. This tool assumes all percent-encoded bytes are CP1251.

Yes. ASCII bytes (0x00 - 0x7F) are identical in Windows-1251 and UTF-8. They pass through unchanged. Only bytes in the 0x80 - 0xFF range undergo remapping. Mixed-language documents (e.g., English text with Russian names) convert correctly.

Look for byte patterns: Cyrillic capital letters cluster in 0xC0 - 0xDF, lowercase in 0xE0 - 0xFF. UTF-8 Cyrillic uses two-byte sequences starting with 0xD0 - 0xD1. If a hex dump shows single-byte values above 0x7F without the 0xD0/0xD1 prefix, it is likely a single-byte encoding. File metadata (BOM, HTTP headers, MIME type declarations) may also declare the charset.