About

CSV files lack a universal standard for field quoting. Some systems export with double quotes ("), others use single quotes ("), and some omit quoting entirely - causing parse failures when fields contain the delimiter character or line breaks. Importing a file quoted with " into a parser expecting " produces corrupted columns, lost rows, or silent data truncation. This tool performs deterministic quote character replacement on CSV data using a finite-state tokenizer compliant with RFC 4180. It correctly handles escaped quotes (doubled characters), fields containing embedded delimiters, and multiline field values.

The tool does not naively find-and-replace characters. It parses the full CSV structure, then re-serializes with the target quote character and chosen quoting strategy. This matters because a raw replacement of " with ' will break any field that already contains a literal apostrophe. The tokenizer resolves this by applying proper escape sequences during re-serialization. Limitations: the tool assumes consistent quoting within the source file. Mixed-quoting files (some fields single-quoted, others double-quoted) require manual inspection first.

Formulas

The CSV tokenizer operates as a finite-state machine with four states. Each input character triggers a transition that determines whether it belongs to the current field, ends the field, or modifies the quoting context.

S ∈ { FIELD_START, UNQUOTED, QUOTED, QUOTE_ESCAPE }

Transition rules govern parsing behavior:

FIELD_START + Q → QUOTED

QUOTED + Q → QUOTE_ESCAPE

QUOTE_ESCAPE + Q → QUOTED (literal quote appended)

QUOTE_ESCAPE + D → FIELD_START (field ends)

Where Q = current quote character, D = delimiter character. During re-serialization, the quoting strategy determines which fields receive the target quote character Q_target:

needsQuote(field) = field contains D ∨ field contains Q_target ∨ field contains newline

Escape within re-serialized fields uses doubling: every occurrence of Q_target inside a field value is replaced with Q_targetQ_target.

Reference Data

Quote Style	Character	Unicode	Common Usage	Escape Method	RFC 4180
Double Quote	"	U+0022	RFC 4180 standard, Excel, Google Sheets	Doubled: ""	Yes
Single Quote	'	U+0027	MySQL exports, some Unix tools	Doubled: ''	No
Backtick	`	U+0060	MySQL identifiers, Markdown	Doubled: ``	No
No Quotes	-	-	Simple numeric CSVs, TSV files	N/A (fields cannot contain delimiter)	Partial
Left Double Curly	“	U+201C	Word processors, copy-paste errors	Doubled	No
Right Double Curly	”	U+201D	Word processors, copy-paste errors	Doubled	No
Left Single Curly	‘	U+2018	macOS auto-correct, rich text	Doubled	No
Right Single Curly	’	U+2019	macOS auto-correct, rich text	Doubled	No
Guillemet Double	«»	U+00AB/BB	European locales, French text	Rare	No
Comma Delimiter	,	U+002C	Default CSV separator	-	Yes
Semicolon Delimiter	;	U+003B	European locales (decimal comma conflict)	-	No
Tab Delimiter	\t	U+0009	TSV files, database exports	-	No
Pipe Delimiter	\|	U+007C	Legacy systems, HL7 medical data	-	No
Caret Delimiter	^	U+005E	Mainframe exports	-	No
Tilde Delimiter	~	U+007E	EDI/X12 transactions	-	No

Frequently Asked Questions

The tool escapes them using the RFC 4180 doubling convention. If you convert to single quotes and a field contains the text it's here, the output becomes 'it''s here'. The embedded single quote is doubled so parsers correctly interpret it as a literal character rather than a field terminator.

The finite-state tokenizer tracks whether the parser is inside a quoted field. While in the QUOTED state, newline characters (both \n and \r\n) are treated as literal content within the field, not as row terminators. This preserves multiline address fields, notes, and descriptions without splitting them into separate rows.

If any field contains the delimiter character (e.g., a comma inside "New York, NY"), removing quotes makes the parser interpret that comma as a field separator, splitting one field into two and misaligning all subsequent columns. The tool warns you when stripping quotes would cause this. Use the Quote When Necessary strategy instead to quote only fields that require it.

Yes. The auto-detect algorithm scans the first 5000 characters and checks for fields beginning with common quote characters (", ', `). It counts occurrences of each candidate appearing at field boundaries (after a delimiter or at line start) and selects the character with the highest boundary-adjacent frequency. If no clear winner is found, it defaults to double quote per RFC 4180.

Yes. You can change the field delimiter independently of the quote character. For example, converting a comma-delimited file with double quotes to a semicolon-delimited file with single quotes. The tool re-parses with the source delimiter and re-serializes with the target delimiter, applying proper quoting to any field that contains the new delimiter character.

The tool processes files up to 50 MB in the browser. Files under 5 MB are parsed synchronously for instant feedback. Larger files are processed in chunks to prevent the browser from becoming unresponsive. For files exceeding 50 MB, consider splitting them with a command-line tool first.

Word processors often replace straight quotes with typographic curly quotes (“ ” or ‘ ’). The auto-detect recognizes these as quote characters. When converting away from them, the tool treats the opening and closing variants as equivalent, stripping or replacing both with the selected target character.