ANSI to UTF-8 Converter
Convert ANSI-encoded text to UTF-8 online. Supports Windows-1250 to 1258, ISO-8859 code pages with auto-detection and file batch processing.
About
ANSI is not a single encoding. It is a family of code pages (Windows-1252, ISO-8859-1, Windows-1251, etc.) where bytes 0x80 - 0xFF map to different Unicode code points depending on the locale. Pasting ANSI-encoded data into a UTF-8 system without proper conversion produces mojibake: garbled sequences like “ä” instead of “ä”. Database imports, legacy CSV migrations, and subtitle files are common failure points. This tool performs real byte-level remapping using complete code page lookup tables for 15 ANSI standards. It does not guess. Each byte in the 0x80 - 0xFF range is resolved to its exact Unicode code point per the selected code page, then re-encoded as valid UTF-8. Auto-detection scores your input against all supported code pages and selects the most probable match.
Limitations: auto-detection works best on natural-language text longer than 50 characters. Short strings or binary data may produce ambiguous results. Mixed-encoding files (partially UTF-8, partially ANSI) require manual segment handling. Pro tip: if your source is a database dump, check the COLLATION setting before converting - the declared encoding may differ from the actual byte content.
Formulas
ANSI-to-UTF-8 conversion is a two-stage remapping process. Stage 1 resolves each ANSI byte to a Unicode code point. Stage 2 encodes that code point as a UTF-8 multi-byte sequence.
Stage 1: Code Page Lookup
For input byte b:
where U = Unicode code point, cp = selected code page, TABLEcp = the 128-entry lookup array for bytes 0x80 - 0xFF.
Stage 2: UTF-8 Encoding
Auto-Detection Scoring
For each code page cp, a score Scp is computed:
Scp = n∑i=0 w(TABLEcp[bi])n
where w(c) returns a weight based on whether code point c is a printable letter (2), punctuation (1), or undefined (−5). The code page with the highest S is selected.
Reference Data
| Code Page | Name | Primary Languages | Unique Range | Notable Characters |
|---|---|---|---|---|
| 1250 | Windows-1250 | Polish, Czech, Hungarian, Romanian | 0x80 - 0xFF | Š, š, Ž, ž, Ł, ł |
| 1251 | Windows-1251 | Russian, Ukrainian, Bulgarian, Serbian | 0x80 - 0xFF | À - я, Ё, ё |
| 1252 | Windows-1252 | English, French, German, Spanish, Portuguese | 0x80 - 0x9F | €, „, “, ”, -, - |
| 1253 | Windows-1253 | Greek | 0x80 - 0xFF | Α - Ω, α - ω |
| 1254 | Windows-1254 | Turkish | 0x80 - 0xFF | Ğ, ğ, İ, ı, Ş, ş |
| 1255 | Windows-1255 | Hebrew | 0xC0 - 0xFA | א - ת (Alef - Tav) |
| 1256 | Windows-1256 | Arabic, Persian, Urdu | 0x80 - 0xFF | ء - ي, ی |
| 1257 | Windows-1257 | Estonian, Latvian, Lithuanian | 0x80 - 0xFF | Ā, ā, Č, č, Ē |
| 1258 | Windows-1258 | Vietnamese | 0x80 - 0xFF | Ơ, ơ, Ư, ư |
| 28591 | ISO-8859-1 | Western European (Latin-1) | 0xA0 - 0xFF | ¿, Ñ, ñ, ß, þ |
| 28592 | ISO-8859-2 | Central European (Latin-2) | 0xA0 - 0xFF | Ą, ą, Ď, ď |
| 28595 | ISO-8859-5 | Cyrillic | 0xA0 - 0xFF | А - я (sequential block) |
| 28597 | ISO-8859-7 | Greek | 0xA0 - 0xFF | Α - Ω (ISO standard) |
| 28599 | ISO-8859-9 | Turkish (Latin-5) | 0xD0, 0xDD, 0xF0, 0xFD | Ğ, İ, ğ, ı replace Ð, Ý, ð, ý |
| 28605 | ISO-8859-15 | Western European (Latin-9) | 0xA4, 0xA6, 0xA8 | €, Š, š replace ¤, ¦, ¨ |
Frequently Asked Questions
iconv.