ASCII Validator
Validate text for ASCII compliance. Detect non-ASCII, control, and extended characters with position mapping and detailed character analysis.
About
Text encoding errors cost hours of debugging. A single non-ASCII character buried in a configuration file, a CSV import, or a source code string can cause silent failures in parsers that expect strict 7-bit ASCII (code points 0 - 127). This tool performs character-level inspection against the ASCII standard (ANSI X3.4-1986). It classifies every code point into control (0 - 31, 127), printable (32 - 126), extended (128 - 255), or non-ASCII (> 255) categories and reports exact line and column positions. Note: this tool validates encoding compliance only. It does not assess semantic correctness of the text content itself.
Formulas
ASCII validation reduces to a code point range check per character. For a string S of length n, each character Si is classified by its Unicode code point c:
The compliance ratio R is computed as:
Where V = count of valid characters within the selected range, and n = total character count. Control characters (code points 0 - 31 and 127) are technically valid ASCII but flagged separately since they cause rendering issues. The position of each invalid character is reported as line L and column C, computed by tracking newline occurrences (LF 0x0A or CRLF 0x0D0A) during iteration.
Reference Data
| Range | Hex | Category | Description | Example | Common Issues |
|---|---|---|---|---|---|
| 0 - 8 | 00 - 08 | Control | Null, Bell, Backspace | NUL, BEL, BS | Binary file corruption, null injection |
| 9 | 09 | Control (Whitespace) | Horizontal Tab | HT (\t) | Tab vs space conflicts in YAML/Python |
| 10 | 0A | Control (Whitespace) | Line Feed | LF (\n) | Unix vs Windows line endings |
| 11 - 12 | 0B - 0C | Control | Vertical Tab, Form Feed | VT, FF | Legacy printer commands in modern text |
| 13 | 0D | Control (Whitespace) | Carriage Return | CR (\r) | Windows CRLF in Unix systems |
| 14 - 31 | 0E - 1F | Control | Shift Out through Unit Separator | SO, SI, DLE, ESC | Terminal escape sequences, data corruption |
| 32 | 20 | Printable | Space | SP | Non-breaking space (U+00A0) confusion |
| 33 - 47 | 21 - 2F | Printable (Punctuation) | Punctuation & Symbols | ! " # $ % & | Smart quotes replacing straight quotes |
| 48 - 57 | 30 - 39 | Printable (Digits) | Arabic Numerals | 0-9 | Full-width digits (U+FF10) substitution |
| 58 - 64 | 3A - 40 | Printable (Punctuation) | Symbols | : ; < = > ? @ | URL encoding conflicts |
| 65 - 90 | 41 - 5A | Printable (Uppercase) | Latin Uppercase Letters | A - Z | Cyrillic lookalikes (Π vs A) |
| 91 - 96 | 5B - 60 | Printable (Punctuation) | Brackets & Symbols | [ \ ] ^ _ ` | Backtick vs acute accent (U+00B4) |
| 97 - 122 | 61 - 7A | Printable (Lowercase) | Latin Lowercase Letters | a - z | Homoglyph attacks in domain names |
| 123 - 126 | 7B - 7E | Printable (Punctuation) | Braces & Tilde | { | } ~ | En-dash (U+2013) replacing tilde |
| 127 | 7F | Control | Delete | DEL | Invisible character in pasted text |
| 128 - 159 | 80 - 9F | Extended (C1 Control) | Windows-1252 control codes | β¬ β β¦ | Mojibake from encoding mismatch |
| 160 - 255 | A0 - FF | Extended (Latin-1) | Latin-1 Supplement | Β© Β° ΓΆ ΓΌ Γ± | UTF-8 multi-byte interpreted as Latin-1 |
| > 255 | > FF | Non-ASCII (Unicode) | Full Unicode range | Ξ» Ξ© δΈ π | BOM markers, zero-width joiners, emoji |
| 65279 | FEFF | BOM / Zero-width | Byte Order Mark | BOM | Invisible BOM at file start breaks parsers |
| 8203 - 8207 | 200B - 200F | Zero-width | Zero-width space, joiners, direction marks | ZWSP, ZWJ, ZWNJ | Invisible chars in copy-pasted text |
Frequently Asked Questions
codePointAt() instead of charCodeAt() to correctly identify these as single characters with their true code point, rather than reporting two separate invalid surrogates.