About

Raw text data from legacy systems, sensor logs, or scraped sources frequently contains characters outside the expected ASCII range. Non-printable control codes (ASCII 0 - 31), extended characters above 126, or stray null bytes cause parsing failures in downstream pipelines. A single out-of-range byte can corrupt a CSV import, break a fixed-width record parser, or produce invisible rendering artifacts in terminal output. This tool applies a strict numeric clamp on each character's code point: given bounds lo and hi, every character with a code outside [lo, hi] is either clamped to the nearest bound, replaced with a user-defined substitute, or removed entirely.

The standard printable ASCII range is 32 - 126. Restricting to 48 - 57 isolates digits only. Clamping to 65 - 90 enforces uppercase-only alphabetic data. This tool approximates a character-level filter assuming single-byte encoding. Multi-byte UTF-8 sequences with code points above 127 will be treated per their individual code unit value, not their combined Unicode scalar. For strict Unicode normalization, a dedicated Unicode tool is required.

Formulas

For each character c in the input string with code point v = charCodeAt(c), and user-defined bounds [lo, hi]:

{

c if lo ≤ v ≤ hichr(lo) if v < lo (clamp mode)chr(hi) if v > hi (clamp mode)r if v ∉ [lo, hi] (replace mode)∅ if v ∉ [lo, hi] (remove mode)

Where v = integer code point of the character, lo = minimum allowed ASCII value, hi = maximum allowed ASCII value, r = user-defined replacement character, chr(n) = String.fromCharCode(n). The modification count M = number of characters where v < lo ∨ v > hi. The modification ratio is MN × 100%, where N = total character count.

Reference Data

Range	Dec	Description	Common Use
NUL - US	0 - 31	Control characters	Terminal control, line endings (LF=10, CR=13, TAB=9)
SP	32	Space	Word separator
! - /	33 - 47	Punctuation & symbols	Exclamation, quotes, hash, dollar, percent
0-9	48 - 57	Digits	Numeric data
: - @	58 - 64	Symbols	Colon, semicolon, angle brackets, equals, at-sign
A - Z	65 - 90	Uppercase letters	Identifiers, constants
[ - `	91 - 96	Brackets & symbols	Array notation, backslash, caret, underscore, backtick
a - z	97 - 122	Lowercase letters	Text, variable names
{ - ~	123 - 126	Braces & symbols	Code blocks, pipe, tilde
DEL	127	Delete control	Legacy terminal delete
Extended	128 - 255	Extended ASCII / Latin-1	Accented chars, currency symbols, box drawing
Printable	32 - 126	All printable ASCII	Standard safe text range
Alphanumeric	48 - 57, 65 - 90, 97 - 122	Letters and digits only	Identifiers, filenames
Whitespace	9, 10, 13, 32	Tab, LF, CR, Space	Text formatting
Base64 safe	43, 47 - 57, 61, 65 - 90, 97 - 122	Base64 character set	Encoded binary data
URL safe	45 - 46, 48 - 57, 65 - 90, 95, 97 - 122, 126	Unreserved URI chars (RFC 3986)	URL paths, query parameters
Filename safe	32 - 126 excl. \ / : * ? " < > \|	OS-safe filename chars	Cross-platform file naming

Frequently Asked Questions

Line feed (ASCII 10) and carriage return (ASCII 13) both fall below the minimum bound of 32. In clamp mode, they become space characters (ASCII 32). In remove mode, they are stripped entirely, collapsing all lines into one. In replace mode, they become your chosen replacement character. If you need to preserve line structure, set your minimum to 10 instead, or pre-process the text to convert line endings to a placeholder before clamping.

JavaScript's charCodeAt returns UTF-16 code units, not full Unicode code points. Characters outside the Basic Multilingual Plane (code points above 65535, such as emoji) are represented as surrogate pairs - two 16-bit values each in the range 55296 - 57343. These individual surrogates will exceed any typical ASCII clamp range and will be clamped, replaced, or removed independently, which may produce unpaired surrogates. For pure ASCII sanitization this is the correct behavior: non-ASCII data is eliminated. For Unicode-aware filtering, a code-point-level tool using codePointAt would be needed.

Yes. Set the range to 32 - 126 with remove mode. This eliminates all control characters (NUL, BEL, ESC, etc.) while preserving standard printable text. If your logs use tab-separated values, expand the lower bound to 9 to preserve horizontal tabs. Note that the tab character is ASCII 9, and common line endings LF (10) and CR (13) are also control characters. Adjust bounds accordingly for your format.

Clamp mode maps out-of-range characters to the nearest boundary: a character below lo becomes chr(lo), and one above hi becomes chr(hi). This preserves the character count but may alias many distinct characters to the same boundary character. Replace mode substitutes all out-of-range characters with a single fixed character you choose (e.g., "?" or '.'). This makes modifications visually obvious in the output. Remove mode deletes out-of-range characters entirely, which changes the string length.

Every non-digit character - spaces, letters, punctuation - falls outside the range [48, 57]. In clamp mode, all characters below ASCII 48 (including space at 32) become "0" (ASCII 48), and all characters above 57 (including all letters) become "9" (ASCII 57). The output will be a string composed entirely of "0" and "9" characters with occasional real digits preserved. This is rarely useful for extraction. For digit extraction, use remove mode instead, which strips non-digits cleanly.

The algorithm is O(n) where n is the character count. For inputs exceeding 100 KB, the tool processes in chunks using asynchronous yielding to keep the UI responsive. Practical limit is browser memory for the string allocation - typically several hundred megabytes. For multi-gigabyte files, a server-side stream processor would be more appropriate.