About

CSV files encoded per RFC 4180 wrap fields in double-quote characters (") whenever the field contains a delimiter, a newline, or the quote character itself. Many downstream systems - flat-file importers, legacy databases, fixed-width loaders - choke on these quotes or interpret them as literal data. Manually stripping them in a text editor risks destroying fields that legitimately contain the delimiter, producing column-shift errors that cascade silently through an entire dataset. This tool implements a character-level state-machine parser that distinguishes structural quotes from literal content, letting you remove only the wrapping quotes while preserving escaped interior quotes and field integrity.

Three removal modes are provided. "Smart" mode removes quotes only from fields that do not require them (no embedded delimiters or newlines). "All surrounding" mode strips every field's outer quotes regardless. "Global" mode deletes every " character - useful only when you are certain no field contains intentional quote characters. The tool auto-detects the delimiter by frequency analysis of the first 5 lines, supporting comma, semicolon, tab, and pipe. Limitations: this tool assumes well-formed CSV. Malformed files with unbalanced quotes will trigger a diagnostic warning with the offending line number.

Formulas

The parser operates as a deterministic finite automaton (DFA) with three states per field:

S₀ = FIELD_START → if char = Q, transition to S₁
S₁ = INSIDE_QUOTED → if char = Q, transition to S₂
S₂ = QUOTE_END_OR_ESCAPE → if next char = Q, emit literal quote, return to S₁

Where Q is the configured quote character (default "). A field is classified as "quote-necessary" when its content satisfies:

needsQuote(field) = field contains D ∨ field contains Q ∨ field contains \n

Where D is the detected delimiter character. In "Smart" mode, quotes are preserved when needsQuote returns TRUE. In "All Surrounding" mode, outer quotes are always stripped and interior escaped quotes ("") are reduced to single quotes ("). In "Global" mode, every instance of Q is deleted without field-boundary awareness.

Delimiter auto-detection scores each candidate by counting occurrences across the first 5 lines and selecting the character with the lowest variance in per-line count and a non-zero mean:

score(D) = countσ(count) + 1

Where count is the mean occurrence per line and σ is standard deviation. The delimiter with the highest score wins.

Reference Data

Scenario	Original Field	Smart Mode Output	All Surrounding Output	Global Output
Simple text, no special chars	"Hello"	Hello	Hello	Hello
Field contains comma	"New York, NY"	"New York, NY" (kept)	New York, NY	New York, NY
Field contains escaped quote	"She said ""hi"""	"She said ""hi""" (kept)	She said "hi"	She said hi
Numeric field quoted	"12345"	12345	12345	12345
Empty quoted field	""	(empty)	(empty)	(empty)
Field with newline	"Line1\nLine2"	"Line1\nLine2" (kept)	Line1\nLine2	Line1\nLine2
Field with delimiter & quote	"Price is $5, ""final"""	"Price is $5, ""final""" (kept)	Price is $5, "final"	Price is $5, final
Unquoted field (no change)	Hello	Hello	Hello	Hello
Tab-delimited quoted	"Data" (tab sep)	Data	Data	Data
Single-quote (not affected)	"Value"	"Value"	"Value"	"Value"
Pipe-delimited with quotes	"A\|B" (pipe sep)	"A\|B" (kept)	A\|B	A\|B
Semicolon-delimited	"München;Berlin"	"München;Berlin" (kept)	München;Berlin	München;Berlin
Mixed: some fields quoted	"A",B,"C,D"	A,B,"C,D"	A,B,C,D	A,B,C,D
Whitespace around quotes	"Data"	Data	Data	Data
Custom quote char (')	"Hello"	Hello (if configured)	Hello (if configured)	Hello (if configured)

Frequently Asked Questions

Per RFC 4180, a literal double-quote inside a quoted field is represented as two consecutive double-quotes (""). When the tool strips surrounding quotes in "All Surrounding" or "Smart" mode, it also un-escapes these pairs, converting "" back to a single ". In "Global" mode, every quote character is simply deleted, which means interior quotes vanish entirely. Choose your mode based on whether downstream systems expect escaped or literal quotes.

The tool samples the first 5 lines and counts occurrences of four candidate delimiters: comma, semicolon, tab, and pipe. It calculates a consistency score for each - the mean count divided by (standard deviation + 1). The candidate with the highest score (most consistent across lines) is selected. You can also override the auto-detected delimiter manually in the settings panel.

In "Smart" mode, no. The parser checks each field's content: if it contains the active delimiter, a newline, or the quote character, the surrounding quotes are preserved. In "All Surrounding" mode, yes - those quotes are stripped regardless, which means re-importing the output into a CSV parser would cause column misalignment. Use "All Surrounding" only when you are exporting to a non-CSV target (plain text, fixed-width, or display).

Yes. The settings panel includes a "Quote Character" option. You can set it to a single quote ('), backtick (`), or any single character. The parser will then treat that character as the field enclosure. This is useful for non-standard CSV exports from systems like MySQL's SELECT INTO OUTFILE which can use arbitrary enclosure characters.

The tool processes data entirely in the browser. For files under 5 MB (roughly 50,000-100,000 rows), processing is near-instantaneous. Files up to 50 MB are supported with a progress indicator and chunked processing via requestAnimationFrame to prevent browser freezing. Beyond 50 MB, browser memory limits may apply depending on your device. The tool will warn you if the input exceeds the recommended threshold.

The state-machine parser correctly distinguishes between newlines that are part of a field's content (inside quotes) and newlines that terminate a row (outside quotes). When surrounding quotes are removed in "All Surrounding" mode, embedded newlines remain in the field content. This means the output row count may appear different from the input row count if you view it in a plain text editor, but structurally the data is correct.