User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Paste raw bytes, hex string, or URL-encoded data
Examples:
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

Windows-1251 (CP1251) is a single-byte character encoding for Cyrillic scripts, standardized by Microsoft in 1990 for use in Windows operating systems across Russian, Ukrainian, Bulgarian, and Serbian locales. Each byte in the range 0x80 - 0xFF maps to a specific Unicode code point, covering 128 additional characters beyond ASCII. Misinterpreting these bytes as UTF-8 produces the classic "mojibake" - garbled sequences like "ร รยพร‘ร‘รยธร‘" instead of readable Cyrillic. This tool performs real byte-level re-encoding using the browser's native TextDecoder API, not a lookup simulation. It accepts raw hex byte sequences, URL-encoded strings (common in legacy HTTP forms), or binary file uploads.

Legacy databases, email archives (MIME headers often declare charset=windows-1251), and CSV exports from Russian-locale software remain common sources of CP1251-encoded data. Incorrect conversion corrupts not just display but data integrity: sorting, search indexing, and regex matching all fail on mojibake. This converter preserves every code point. Note: bytes 0x98 and 0x88 are undefined in CP1251 and will produce replacement characters (U+FFFD).

windows-1251 utf-8 cp1251 encoding converter character encoding cyrillic converter text decoder

Formulas

Windows-1251 is a single-byte encoding. Each input byte b maps to exactly one Unicode code point U via a fixed lookup table. The conversion function is:

U = CP1251_TABLE(b), where b โˆˆ [0x00, 0xFF]

For bytes in the ASCII range:

{
U = b if 0x00 โ‰ค b โ‰ค 0x7FU = TABLE[b โˆ’ 0x80] if 0x80 โ‰ค b โ‰ค 0xFF

The resulting Unicode code point U is then encoded as UTF-8 using the standard multi-byte scheme:

{
1 byte: 0xxxxxxx if U โ‰ค 0x7F2 bytes: 110xxxxx 10xxxxxx if U โ‰ค 0x7FF3 bytes: 1110xxxx 10xxxxxx 10xxxxxx if U โ‰ค 0xFFFF

Where b = input byte in Windows-1251 encoding, U = resulting Unicode code point, CP1251_TABLE = the 128-entry mapping for bytes 0x80 - 0xFF. Cyrillic block ะ - ั occupies a contiguous range: U = b โˆ’ 0xC0 + 0x0410 for capital letters and U = b โˆ’ 0xE0 + 0x0430 for lowercase.

Reference Data

Byte Range (Hex)CP1251 CharactersUnicode RangeDescription
0x00 - 0x7FStandard ASCIIU+0000 - U+007FIdentical to ASCII / UTF-8 single-byte
0x80ะ‚U+0402Cyrillic capital letter DJE (Serbian)
0x81ะƒU+0403Cyrillic capital letter GJE (Macedonian)
0x82โ€šU+201ASingle low-9 quotation mark
0x83ั“U+0453Cyrillic small letter GJE
0x84"U+201EDouble low-9 quotation mark
0x85โ€ฆU+2026Horizontal ellipsis
0x86โ€ U+2020Dagger
0x88โ‚ฌU+20ACEuro sign
0x8Aะ‰U+0409Cyrillic capital letter LJE
0x8DะŒU+040CCyrillic capital letter KJE
0x8Eะ‹U+040BCyrillic capital letter TSHE
0x8FะU+040FCyrillic capital letter DZHE
0x90ั’U+0452Cyrillic small letter DJE
0xA0 U+00A0Non-breaking space
0xA8ะU+0401Cyrillic capital letter IO
0xB8ั‘U+0451Cyrillic small letter IO
0xC0 - 0xDFะ - ะฏU+0410 - U+042FCyrillic capital letters A through YA
0xE0 - 0xFFะฐ - ัU+0430 - U+044FCyrillic small letters a through ya
0xB0ยฐU+00B0Degree sign
0xABยซU+00ABLeft double angle quotation mark
0xBBยปU+00BBRight double angle quotation mark
0x96 - U+2013En dash
0x97 - U+2014Em dash
0xAAะ„U+0404Cyrillic capital letter Ukrainian IE
0xBAั”U+0454Cyrillic small letter Ukrainian IE
0xAFะ‡U+0407Cyrillic capital letter YI (Ukrainian)
0xBFั—U+0457Cyrillic small letter YI (Ukrainian)
0xB2ะ†U+0406Cyrillic capital letter Byelorussian-Ukrainian I
0xB3ั–U+0456Cyrillic small letter Byelorussian-Ukrainian I
0xA5าU+0490Cyrillic capital letter GHE with upturn (Ukrainian)

Frequently Asked Questions

Bytes 0x98 is technically undefined in the original Windows-1251 specification. The browser's native TextDecoder maps it to U+FFFD (Unicode Replacement Character). This tool preserves that behavior rather than silently dropping bytes, ensuring you can audit the output for data corruption.
All three encode Cyrillic scripts in a single byte, but they use completely different byte-to-character mappings. For example, the byte 0xC0 is "ะ" (A) in CP1251, "ะ " (R) in ISO-8859-5, and "ัŽ" (yu) in KOI8-R. Applying the wrong decoding table produces mojibake. This tool only handles Windows-1251. Using it on KOI8-R data will produce incorrect output.
The tool accepts files up to 50 MB. Since Windows-1251 is a single-byte encoding, the conversion is a linear O(n) operation and completes in milliseconds even for large files. The browser's TextDecoder processes the entire ArrayBuffer in a single call. For files beyond 50 MB, consider command-line tools like iconv.
URL encoding uses percent-encoded bytes like %C0%E5. If your source system used a different encoding (e.g., UTF-8) for the URL encoding, the bytes represent UTF-8 sequences, not CP1251. Feeding UTF-8 bytes through a CP1251 decoder will produce incorrect results. Verify the source encoding before converting. This tool assumes all percent-encoded bytes are CP1251.
Yes. ASCII bytes (0x00 - 0x7F) are identical in Windows-1251 and UTF-8. They pass through unchanged. Only bytes in the 0x80 - 0xFF range undergo remapping. Mixed-language documents (e.g., English text with Russian names) convert correctly.
Look for byte patterns: Cyrillic capital letters cluster in 0xC0 - 0xDF, lowercase in 0xE0 - 0xFF. UTF-8 Cyrillic uses two-byte sequences starting with 0xD0 - 0xD1. If a hex dump shows single-byte values above 0x7F without the 0xD0/0xD1 prefix, it is likely a single-byte encoding. File metadata (BOM, HTTP headers, MIME type declarations) may also declare the charset.