User Rating 0.0
Total Usage 0 times
Drop a CSV file here or click to browse Max 50 MB · UTF-8 encoded
Is this tool helpful?

Your feedback helps us improve.

About

CSV (Comma-Separated Values) parsing appears trivial until a quoted field contains an embedded comma, a newline, or a literal double-quote escaped as "". RFC 4180 defines the grammar, but real-world exports from Excel, Google Sheets, and legacy ERP systems routinely deviate with BOM markers, mixed line endings (CRLF vs LF), and semicolon delimiters dictated by locale. Incorrect parsing silently shifts columns, corrupting downstream analysis. This converter implements a character-level state-machine parser that handles all RFC 4180 edge cases, auto-detects the input delimiter, and outputs clean TXT in your choice of tab-delimited, fixed-width, pipe-separated, or space-separated format. Processing runs entirely in your browser. No data leaves your machine.

The tool enforces strict quoting rules: a field that begins with a double-quote must end with one, and internal quotes must be doubled ("""). Malformed rows are flagged, not silently dropped. Fixed-width output pads each column to its maximum observed width plus 2 characters, aligning data for monospaced display or legacy mainframe ingest. Files up to 50 MB are supported. For files exceeding 1 MB, parsing offloads to a Web Worker to keep the UI responsive. Note: this tool assumes UTF-8 encoding. Non-UTF-8 files may produce garbled characters in multibyte sequences.

csv to txt csv converter txt converter csv to text delimiter converter csv parser file converter data export

Formulas

The CSV parser operates as a finite state machine with three states: FIELD_START, IN_QUOTED, and IN_UNQUOTED. Transitions are determined character-by-character:

{
FIELD_START IN_QUOTED if char = "FIELD_START IN_UNQUOTED if char "IN_QUOTED FIELD_START if char = " next "IN_UNQUOTED FIELD_START if char = delimiter

Delimiter auto-detection counts occurrences of each candidate delimiter (, ; \t |) across the first 5 lines. The delimiter with the lowest coefficient of variation in per-line counts is selected:

score = σμ

where σ is the standard deviation of per-line counts and μ is the mean. The candidate with the lowest score (most consistent count per line) wins. Ties are broken by priority order: comma > semicolon > tab > pipe.

Fixed-width output computes column width as:

Wj = max(len(celli,j)) + 2 for all rows i

where Wj is the padded width for column j, and each cell is right-padded with spaces to Wj characters.

Reference Data

Output FormatSeparator CharacterBest ForColumn AlignmentReadabilityImport Compatibility
Tab-Delimited\t (U+0009)Spreadsheets, databasesVariableMediumExcel, SQL loaders, R, Python pandas
Fixed-WidthSpace paddingMainframes, COBOL, reportsExact column alignmentHighFORTRAN, SAS, legacy ETL
Pipe-Delimited| (U+007C)Data pipelines, logsVariableMediumUnix tools, awk, sed
Space-DelimitedSingle spaceSimple text, CLI toolsVariableLow (if data has spaces)cut, tr, shell scripts
Custom DelimiterUser-defined characterProprietary formatsVariableVariesApplication-specific
Common CSV Input Delimiters (Auto-Detected)
Comma, (U+002C)Default RFC 4180 - - Universal
Semicolon; (U+003B)European locale Excel exports - - German, French, Italian Excel
Tab (TSV)\t (U+0009)Tab-separated values - - Widely supported
Pipe| (U+007C)Medical (HL7), financial feeds - - Domain-specific
RFC 4180 Quoting Rules
Plain fieldNo quoting required: hello
Field with commaMust be quoted: "hello, world"
Field with newlineMust be quoted: "line1\nline2"
Field with quoteQuote doubled inside quotes: "say ""hello"""
Empty fieldTwo consecutive delimiters: a,,c
Quoted emptyExplicit empty: a,"",c
File Size & Performance
< 100 KBInstant parsing on main thread (< 50 ms)
100 KB - 1 MBMain thread, 50 - 500 ms
1 MB - 50 MBWeb Worker parsing, progress indicator shown
> 50 MBRejected with error (browser memory limits)

Frequently Asked Questions

The parser samples the first 5 lines and counts occurrences of each candidate delimiter (comma, semicolon, tab, pipe) per line. It then calculates the coefficient of variation (standard deviation divided by mean) for each candidate. A consistent delimiter produces nearly equal counts per line, yielding a low coefficient of variation. The candidate with the lowest score is selected. If two candidates tie, priority order is: comma, semicolon, tab, pipe. This handles European-locale Excel exports that use semicolons because commas serve as decimal separators.
Per RFC 4180, a field enclosed in double quotes may contain line breaks (CRLF or LF). The parser's IN_QUOTED state does not treat newline characters as row terminators. The field continues until a closing double-quote is found that is NOT followed by another double-quote. This means a single logical CSV row can span multiple physical lines. The converter preserves or strips these embedded newlines based on your output format choice. In tab-delimited mode, embedded newlines are replaced with a space to prevent row misalignment in the output TXT.
Yes. The parser does not enforce a fixed column count. If row 1 has 5 fields and row 7 has 3 fields, both are parsed as-is. In fixed-width mode, short rows are padded with empty columns to match the maximum column count observed. A warning toast appears noting the row discrepancy. This is common in real-world exports where trailing empty fields are omitted.
The padding formula adds 2 characters beyond the maximum observed cell width for each column. This ensures visual separation between adjacent columns when displayed in a monospaced font. Without padding, columns would run together wherever a cell reaches maximum width. The value of 2 is a standard convention in COBOL copybooks and mainframe fixed-format files, providing readable gutters without excessive whitespace.
The parser detects a UTF-8 BOM (U+FEFF, encoded as EF BB BF) at the start of the file and strips it before parsing. This prevents the BOM from appearing as a phantom character in the first field of the first row, which is a common issue when opening Excel-exported CSVs in Unix tools. The output TXT file is written without a BOM.
The limit is 50 MB. Browser-based JavaScript holds the entire file content as a string in memory. A 50 MB CSV file can expand to approximately 100 MB in memory due to UTF-16 internal string representation. Beyond this, browsers may hit memory limits or become unresponsive. Files between 1 MB and 50 MB are processed in a Web Worker to keep the UI thread free. For files exceeding 50 MB, consider a server-side tool or command-line utility like awk or csvtool.
In CSV, a literal double-quote inside a quoted field is escaped by doubling it: ""say ""hello"""" represents the value say "hello". The parser un-escapes these during parsing, restoring single double-quotes. In the output TXT, the raw value (with single quotes) is written directly since tab-delimited, pipe, and fixed-width formats do not use quote escaping. If you later need to re-import the TXT as CSV, you would need to re-apply quoting rules.