CSV to String Converter
Convert CSV data to formatted strings with custom delimiters, quoting, and join options. Paste, upload, or type CSV and get instant string output.
About
CSV parsing appears simple until a quoted field contains your delimiter, a newline, or an escaped double-quote. A naive split(",") approach fails on roughly 23% of real-world CSV files that use RFC 4180 quoting conventions. This tool implements a finite-state machine tokenizer that correctly handles all edge cases defined in RFC 4180, including fields wrapped in double-quotes ("), escaped quotes (""), and embedded newlines. Miscounting fields corrupts entire downstream data pipelines. The auto-detect algorithm analyzes delimiter frequency across the first 5 rows to identify comma, semicolon, tab, or pipe separation without user intervention.
Output formatting is configurable: choose a join character between fields, a row separator, optional quoting or wrapping, and whitespace trimming. The tool approximates general CSV structure assuming well-formed input. Malformed rows (inconsistent field counts) are flagged but still processed. Pro tip: always verify field counts match your expected column count before feeding output into another system.
Formulas
The CSV parser operates as a finite-state machine with 4 states. For each character c at position i in the input string S, the transition function is:
Where δ maps: FIELD_START → QUOTED when c = ", FIELD_START → UNQUOTED otherwise, QUOTED → QUOTE_IN_QUOTED when c = ", and QUOTE_IN_QUOTED → QUOTED when c = " (escaped quote).
Auto-detection scores each candidate delimiter d by computing field counts per row:
Where consistency = mean field count × row count. The delimiter with the highest score is selected. counts is the array of field counts per row for delimiter d. stdev is the standard deviation. A score of 0 indicates inconsistent splitting, eliminating that delimiter candidate.
Reference Data
| Delimiter | Common Name | Character Code | Typical Use | Auto-Detect Priority |
|---|---|---|---|---|
| , | Comma | U+002C | Standard CSV (RFC 4180) | 1 |
| ; | Semicolon | U+003B | European CSV (Excel EU locale) | 2 |
| \t | Tab | U+0009 | TSV files, database exports | 3 |
| | | Pipe | U+007C | Unix data, log files | 4 |
| " | Double Quote | U+0022 | Field quoting (RFC 4180) | - |
| "" | Escaped Quote | U+0022 × 2 | Literal quote inside quoted field | - |
| \n | Line Feed | U+000A | Unix row separator | - |
| \r\n | CRLF | U+000D U+000A | Windows row separator | - |
| \r | Carriage Return | U+000D | Classic Mac row separator | - |
| Common Output Join Patterns | ||||
| , | Comma-space | - | Human-readable lists | - |
| | | Pipe-padded | - | Markdown tables, logs | - |
| & | Ampersand | - | LaTeX tables | - |
| \t | Tab | U+0009 | Tab-separated output | - |
| " " | Space | U+0020 | Space-delimited output | - |
| ; | Semicolon | U+003B | SQL value lists | - |
| RFC 4180 Rules Summary | ||||
| Rule 1 | Each record on a separate line, delimited by CRLF | |||
| Rule 2 | Last record may or may not have a trailing CRLF | |||
| Rule 3 | Optional header line with same format as records | |||
| Rule 4 | Fields may be enclosed in double quotes | |||
| Rule 5 | Fields containing CRLF, quotes, or commas must be quoted | |||
| Rule 6 | Double quotes inside quoted field escaped as "" | |||
| Rule 7 | Spaces inside fields are part of the field value | |||