Windows-1251 to UTF-8 Converter
Convert Windows-1251 (CP1251) encoded text and files to UTF-8 Unicode online. Supports hex bytes, URL-encoded strings, and file uploads.
About
Windows-1251 (CP1251) is a single-byte character encoding for Cyrillic scripts, standardized by Microsoft in 1990 for use in Windows operating systems across Russian, Ukrainian, Bulgarian, and Serbian locales. Each byte in the range 0x80 - 0xFF maps to a specific Unicode code point, covering 128 additional characters beyond ASCII. Misinterpreting these bytes as UTF-8 produces the classic "mojibake" - garbled sequences like "ร รยพรรรยธร" instead of readable Cyrillic. This tool performs real byte-level re-encoding using the browser's native TextDecoder API, not a lookup simulation. It accepts raw hex byte sequences, URL-encoded strings (common in legacy HTTP forms), or binary file uploads.
Legacy databases, email archives (MIME headers often declare charset=windows-1251), and CSV exports from Russian-locale software remain common sources of CP1251-encoded data. Incorrect conversion corrupts not just display but data integrity: sorting, search indexing, and regex matching all fail on mojibake. This converter preserves every code point. Note: bytes 0x98 and 0x88 are undefined in CP1251 and will produce replacement characters (U+FFFD).
Formulas
Windows-1251 is a single-byte encoding. Each input byte b maps to exactly one Unicode code point U via a fixed lookup table. The conversion function is:
For bytes in the ASCII range:
The resulting Unicode code point U is then encoded as UTF-8 using the standard multi-byte scheme:
Where b = input byte in Windows-1251 encoding, U = resulting Unicode code point, CP1251_TABLE = the 128-entry mapping for bytes 0x80 - 0xFF. Cyrillic block ะ - ั occupies a contiguous range: U = b โ 0xC0 + 0x0410 for capital letters and U = b โ 0xE0 + 0x0430 for lowercase.
Reference Data
| Byte Range (Hex) | CP1251 Characters | Unicode Range | Description |
|---|---|---|---|
| 0x00 - 0x7F | Standard ASCII | U+0000 - U+007F | Identical to ASCII / UTF-8 single-byte |
| 0x80 | ะ | U+0402 | Cyrillic capital letter DJE (Serbian) |
| 0x81 | ะ | U+0403 | Cyrillic capital letter GJE (Macedonian) |
| 0x82 | โ | U+201A | Single low-9 quotation mark |
| 0x83 | ั | U+0453 | Cyrillic small letter GJE |
| 0x84 | " | U+201E | Double low-9 quotation mark |
| 0x85 | โฆ | U+2026 | Horizontal ellipsis |
| 0x86 | โ | U+2020 | Dagger |
| 0x88 | โฌ | U+20AC | Euro sign |
| 0x8A | ะ | U+0409 | Cyrillic capital letter LJE |
| 0x8D | ะ | U+040C | Cyrillic capital letter KJE |
| 0x8E | ะ | U+040B | Cyrillic capital letter TSHE |
| 0x8F | ะ | U+040F | Cyrillic capital letter DZHE |
| 0x90 | ั | U+0452 | Cyrillic small letter DJE |
| 0xA0 | U+00A0 | Non-breaking space | |
| 0xA8 | ะ | U+0401 | Cyrillic capital letter IO |
| 0xB8 | ั | U+0451 | Cyrillic small letter IO |
| 0xC0 - 0xDF | ะ - ะฏ | U+0410 - U+042F | Cyrillic capital letters A through YA |
| 0xE0 - 0xFF | ะฐ - ั | U+0430 - U+044F | Cyrillic small letters a through ya |
| 0xB0 | ยฐ | U+00B0 | Degree sign |
| 0xAB | ยซ | U+00AB | Left double angle quotation mark |
| 0xBB | ยป | U+00BB | Right double angle quotation mark |
| 0x96 | - | U+2013 | En dash |
| 0x97 | - | U+2014 | Em dash |
| 0xAA | ะ | U+0404 | Cyrillic capital letter Ukrainian IE |
| 0xBA | ั | U+0454 | Cyrillic small letter Ukrainian IE |
| 0xAF | ะ | U+0407 | Cyrillic capital letter YI (Ukrainian) |
| 0xBF | ั | U+0457 | Cyrillic small letter YI (Ukrainian) |
| 0xB2 | ะ | U+0406 | Cyrillic capital letter Byelorussian-Ukrainian I |
| 0xB3 | ั | U+0456 | Cyrillic small letter Byelorussian-Ukrainian I |
| 0xA5 | า | U+0490 | Cyrillic capital letter GHE with upturn (Ukrainian) |
Frequently Asked Questions
iconv.%C0%E5. If your source system used a different encoding (e.g., UTF-8) for the URL encoding, the bytes represent UTF-8 sequences, not CP1251. Feeding UTF-8 bytes through a CP1251 decoder will produce incorrect results. Verify the source encoding before converting. This tool assumes all percent-encoded bytes are CP1251.