About

ASCII defines 128 characters (code points 0 - 127). ANSI extends this to 256 characters using Windows code pages, where bytes 128 - 255 map to locale-specific glyphs. Confusing code pages corrupts text irreversibly. A file saved as CP1252 (Western European) and opened as CP1251 (Cyrillic) produces garbled output known as mojibake. This tool performs real byte-level encoding using complete lookup tables for each Windows code page. It does not guess. It maps every character deterministically and flags anything outside the target code page’s repertoire.

The converter accepts plain text or uploaded files and produces the exact byte sequence a Windows application would generate for the selected code page. Output is available as hexadecimal, decimal, or binary representation. You can download the raw encoded binary file for integration testing, protocol debugging, or legacy system interoperability. Note: ASCII characters (0 - 127) are identical across all ANSI code pages. Divergence occurs only in the upper half (128 - 255).

Formulas

The encoding process maps each Unicode code point to a single byte in the target ANSI code page.

encode(c) =

{

c if 0 ≤ c ≤ 127 (ASCII range)T_cp[c] if c ∈ T_cp (code page lookup)0x3F otherwise (unmappable → '?')

Where c is the Unicode code point of the input character, T_cp is the lookup table for code page cp, and 0x3F is the byte for the replacement character "?". The hex representation converts each byte b to a two-digit hexadecimal string via b.toString(16).padStart(2, "0"). Binary output uses b.toString(2).padStart(8, "0").

Reference Data

Code Page	Name	Region / Language	Unique Range	Notable Characters
CP1250	Windows-1250	Central European	128 - 255	Š, š, Ž, ž, ł, ą
CP1251	Windows-1251	Cyrillic	128 - 255	А - Я, а - я, Ё, ё
CP1252	Windows-1252	Western European	128 - 255	€, ß, ö, ñ, ç
CP1253	Windows-1253	Greek	128 - 255	Α - Ω, α - ω
CP1254	Windows-1254	Turkish	128 - 255	ş, ğ, İ, ı, ç
CP1255	Windows-1255	Hebrew	128 - 255	א - ת, niqqud marks
CP1256	Windows-1256	Arabic	128 - 255	Arabic letters, ‎, ‏
CP1257	Windows-1257	Baltic	128 - 255	ā, č, ē, ģ, ķ, ļ, ņ
CP1258	Windows-1258	Vietnamese	128 - 255	ơ, ư, combining tones
CP874	Windows-874	Thai	128 - 255	Thai consonants, vowels, tones
ASCII	US-ASCII	Universal	0 - 127	Control chars, printable Latin
Bytes 0x00 - 0x7F are shared across all code pages. Bytes 0x80 - 0x9F in CP1252 contain printable characters (e.g., € at 0x80) where ISO-8859-1 has control codes.

Frequently Asked Questions

Characters without a mapping in the target code page are replaced with byte 0x3F (the "?" character) and flagged in the output with a visual warning indicator. The converter counts unmappable characters and displays the total so you can assess data loss before downloading the encoded file.

ISO-8859-1 assigns C1 control codes (non-printable) to bytes 0x80 - 0x9F. Microsoft's CP1252 repurposed these bytes for printable characters like the Euro sign (€ at 0x80), curly quotes, and em-dashes. This is why HTML pages declared as ISO-8859-1 often actually use CP1252. Browsers compensate silently, but binary tools do not.

The converter preserves line endings exactly as entered. Windows ANSI files typically use CR+LF (0x0D 0x0A), Unix uses LF (0x0A), and classic Mac uses CR (0x0D). Since these are all ASCII-range bytes, they pass through unchanged regardless of code page. The hex view shows exactly which line ending bytes are present.

Yes. The input field accepts any text your browser can render (which is UTF-16 internally). The converter reads each Unicode code point and maps it to the corresponding single byte in the selected ANSI code page. Multi-byte UTF-8 sequences like 0xC3 0xA9 (é) become single byte 0xE9 in CP1252. Characters beyond the code page's repertoire (e.g., CJK ideographs in CP1252) become 0x3F.

CP1252 (Windows-1252) is the safest default for Western languages. It is the most widely deployed ANSI code page and is the de facto encoding assumed by most legacy Windows applications, older HTTP servers, and email clients in the Americas and Western Europe. For Cyrillic text, use CP1251. For Central European languages with diacritics (Polish, Czech, Hungarian), use CP1250.

No. Pure ASCII text (code points 0 - 127) produces identical bytes in every ANSI code page. The conversion is a no-op for ASCII. Differences only appear when your text contains characters above code point 127, such as accented letters, currency symbols, or typographic punctuation. The tool highlights which characters fall in the extended range.