About

Malformed CSV files cause silent data corruption. A single unquoted comma inside an address field shifts every subsequent column, breaking database imports, ETL pipelines, and spreadsheet formulas. The RFC 4180 specification mandates that fields containing the delimiter character, double-quote ("), or a line break must be enclosed in double-quotes, with internal quotes escaped by doubling (""). This tool parses your CSV character-by-character using a finite state machine and re-serializes it with correct quoting applied. It does not use regular expressions for parsing, which fail on edge cases involving embedded newlines.

Four quoting modes are supported. "All Fields" wraps every field unconditionally. "Only When Needed" applies minimal quoting per RFC 4180 rules. "Force Numeric" and "Force Text" target fields by detected type. The tool auto-detects your delimiter from comma, semicolon, tab, and pipe by frequency analysis on the first 5 lines. Limitation: this tool assumes UTF-8 encoding and does not handle binary content or BOM markers.

Formulas

The quoting decision for each field F follows a boolean predicate based on the selected mode:

needsQuote(F, mode) =

{

TRUE if mode = ALLF contains delim ∨ quote ∨ newline if mode = NEEDEDisNumeric(F) if mode = FORCE_NUMERIC¬isNumeric(F) if mode = FORCE_TEXT

The escape function for internal quotes uses the doubling rule:

escape(F) = F.replace(", "")

Where delim is the active delimiter character (comma, semicolon, tab, or pipe), quote is the double-quote character U+0022, and newline matches both \n (LF) and \r\n (CRLF). The function isNumeric tests if the trimmed field matches the pattern /^-?\d+(\.\d+)?$/ (optional negative, digits, optional decimal).

Reference Data

Scenario	Raw Field Value	Correctly Quoted Output	RFC 4180 Rule
Field contains delimiter	New York, NY	"New York, NY"	Section 2.6
Field contains double-quote	He said "hello"	"He said ""hello"""	Section 2.7
Field contains newline	Line1\nLine2	"Line1\nLine2"	Section 2.6
Field with leading/trailing spaces	data	" data "	Recommended practice
Empty field	(empty)	""	Optional per mode
Numeric field (integer)	12345	12345	No quoting needed
Numeric field forced quoted	00123	"00123"	Preserves leading zeros
Field already correctly quoted	"valid"	"valid"	No change needed
Field with escaped quotes inside	"She said ""hi"""	"She said ""hi"""	Idempotent operation
Boolean-like field	TRUE	TRUE	No quoting needed
Date field	2024-01-15	2024-01-15	No special chars
URL field	https://example.com/path?q=1&b=2	https://example.com/path?q=1&b=2	No delimiter present
Tab-delimited with comma	Price, USD (tab sep)	Price, USD	Comma is not the delimiter
Pipe-delimited with pipe in data	A\|B (pipe sep)	"A\|B"	Field contains delimiter
Semicolon-delimited EU CSV	1.234,56 (semicolon sep)	1.234,56	Comma is not the delimiter

Frequently Asked Questions

The most common cause is an unquoted field containing the delimiter character. For example, an address like "123 Main St, Suite 4" contains a comma. If comma is your delimiter and the field is not wrapped in double-quotes, the parser splits it into two columns, shifting all subsequent data. This tool detects such fields and adds quotes per RFC 4180 Section 2.6.

The parser first strips existing quotes and unescapes doubled quotes ("") back to single quotes ("). Then it re-applies quoting from scratch based on your selected mode. This makes the operation idempotent: running it twice produces the same output as running it once.

The tool examines the first 5 lines and counts occurrences of four candidate delimiters: comma, semicolon, tab, and pipe. It selects the character with the most consistent count across lines (lowest variance). Auto-detection can fail on files with fewer than 2 lines, files where multiple delimiters appear with equal frequency, or files where the delimiter only appears inside quoted fields. In such cases, select the delimiter manually.

"Only When Needed" produces smaller files and is RFC 4180 compliant. "All Fields" is safer for interoperability with parsers that have non-standard behavior, such as some versions of Microsoft Excel's CSV import or legacy mainframe systems. If you control both the writer and reader, use "Only When Needed". If you are sending CSV to an unknown third party, use "All Fields".

Spreadsheet applications like Excel and Google Sheets automatically interpret numeric-looking strings. A product code like 00123 becomes 123 (leading zeros stripped). A long number like 1234567890123456 loses precision due to IEEE 754 floating-point limits. Quoting numeric fields prevents this auto-conversion. The "Force Numeric" mode targets only fields matching the pattern /^-?\d+(\.\d+)?$/.

Yes. The character-level finite state machine tracks whether the parser is inside a quoted field. A newline encountered inside a quoted field is treated as field content, not a row separator. This is the primary reason regex-based CSV parsers fail: they split on newlines globally. This tool correctly preserves multi-line field values.

The tool runs entirely in your browser. Practical limits depend on available RAM. Files under 5 MB process near-instantly. Files between 5-50 MB may take a few seconds. For files above 50 MB, the browser may run out of memory. There is no server upload: your data never leaves your machine.