About

Text encoding errors cost hours of debugging. A single non-ASCII character buried in a configuration file, a CSV import, or a source code string can cause silent failures in parsers that expect strict 7-bit ASCII (code points 0 - 127). This tool performs character-level inspection against the ASCII standard (ANSI X3.4-1986). It classifies every code point into control (0 - 31, 127), printable (32 - 126), extended (128 - 255), or non-ASCII (> 255) categories and reports exact line and column positions. Note: this tool validates encoding compliance only. It does not assess semantic correctness of the text content itself.

Formulas

ASCII validation reduces to a code point range check per character. For a string S of length n, each character S_i is classified by its Unicode code point c:

{

VALID if 0 ≤ c ≤ 127 (Standard ASCII)EXTENDED if 128 ≤ c ≤ 255 (Latin-1 Supplement)NON-ASCII if c > 255 (Unicode)

The compliance ratio R is computed as:

R = Vn × 100%

Where V = count of valid characters within the selected range, and n = total character count. Control characters (code points 0 - 31 and 127) are technically valid ASCII but flagged separately since they cause rendering issues. The position of each invalid character is reported as line L and column C, computed by tracking newline occurrences (LF 0x0A or CRLF 0x0D0A) during iteration.

Reference Data

Range	Hex	Category	Description	Example	Common Issues
0 - 8	00 - 08	Control	Null, Bell, Backspace	NUL, BEL, BS	Binary file corruption, null injection
9	09	Control (Whitespace)	Horizontal Tab	HT (\t)	Tab vs space conflicts in YAML/Python
10	0A	Control (Whitespace)	Line Feed	LF (\n)	Unix vs Windows line endings
11 - 12	0B - 0C	Control	Vertical Tab, Form Feed	VT, FF	Legacy printer commands in modern text
13	0D	Control (Whitespace)	Carriage Return	CR (\r)	Windows CRLF in Unix systems
14 - 31	0E - 1F	Control	Shift Out through Unit Separator	SO, SI, DLE, ESC	Terminal escape sequences, data corruption
32	20	Printable	Space	SP	Non-breaking space (U+00A0) confusion
33 - 47	21 - 2F	Printable (Punctuation)	Punctuation & Symbols	! " # $ % &	Smart quotes replacing straight quotes
48 - 57	30 - 39	Printable (Digits)	Arabic Numerals	0-9	Full-width digits (U+FF10) substitution
58 - 64	3A - 40	Printable (Punctuation)	Symbols	: ; < = > ? @	URL encoding conflicts
65 - 90	41 - 5A	Printable (Uppercase)	Latin Uppercase Letters	A - Z	Cyrillic lookalikes (А vs A)
91 - 96	5B - 60	Printable (Punctuation)	Brackets & Symbols	[ \ ] ^ _ `	Backtick vs acute accent (U+00B4)
97 - 122	61 - 7A	Printable (Lowercase)	Latin Lowercase Letters	a - z	Homoglyph attacks in domain names
123 - 126	7B - 7E	Printable (Punctuation)	Braces & Tilde	{ \| } ~	En-dash (U+2013) replacing tilde
127	7F	Control	Delete	DEL	Invisible character in pasted text
128 - 159	80 - 9F	Extended (C1 Control)	Windows-1252 control codes	€ „ …	Mojibake from encoding mismatch
160 - 255	A0 - FF	Extended (Latin-1)	Latin-1 Supplement	© ° ö ü ñ	UTF-8 multi-byte interpreted as Latin-1
> 255	> FF	Non-ASCII (Unicode)	Full Unicode range	λ Ω 世 😀	BOM markers, zero-width joiners, emoji
65279	FEFF	BOM / Zero-width	Byte Order Mark	BOM	Invisible BOM at file start breaks parsers
8203 - 8207	200B - 200F	Zero-width	Zero-width space, joiners, direction marks	ZWSP, ZWJ, ZWNJ	Invisible chars in copy-pasted text

Frequently Asked Questions

Standard ASCII (ANSI X3.4-1986) defines 128 code points (0 - 127) covering English letters, digits, punctuation, and control characters. Extended ASCII is not a single standard. It refers to various 8-bit encodings (ISO 8859-1, Windows-1252, etc.) that use code points 128 - 255 for additional characters like accented letters (ö, ñ) and symbols (©, °). This validator flags extended characters because they are encoding-dependent and may render differently across systems.

Common culprits are invisible characters: Byte Order Marks (BOM, U+FEFF) at file start, zero-width spaces (U+200B) from web copy-paste, non-breaking spaces (U+00A0) instead of regular spaces (U+0020), and smart/curly quotes (U+201C, U+201D) inserted by word processors. This validator highlights exact positions of these hidden characters so you can locate and replace them.

It depends on context. Tab (9), Line Feed (10), and Carriage Return (13) are universally expected in text files. Other control characters (NUL, BEL, ESC, etc.) are rarely legitimate in modern text and often indicate binary data corruption. Use the "Printable Only" mode (32 - 126) to enforce strict visible-character-only validation, or "Standard ASCII" mode which allows all 128 code points including control characters.

JavaScript strings use UTF-16 encoding internally. Characters outside the Basic Multilingual Plane (code points above U+FFFF), such as emoji (😀) and rare CJK characters, are represented as surrogate pairs (two 16-bit code units). This validator uses codePointAt() instead of charCodeAt() to correctly identify these as single characters with their true code point, rather than reporting two separate invalid surrogates.

Homoglyphs are visually identical characters from different Unicode blocks. Cyrillic "А" (U+0410) looks identical to Latin "A" (U+0041) but has a completely different code point. Attackers exploit this in phishing URLs and source code. ASCII validation detects any non-Latin character regardless of visual appearance. If a character falls outside the 0 - 127 range, it is flagged, even if it looks like a standard English letter.

The validator processes text in the browser without server upload. For texts exceeding approximately 100 KB, processing is chunked to prevent the browser from freezing. Texts up to several megabytes can be validated, but rendering the character-by-character highlight map for very large files may be slow. For files above 1 MB, consider using the CSV export to analyze results externally rather than relying on the visual highlight view.