User Rating 0.0 ★★★★★

Total Usage 0 times

Category Document Converters

CSV Input

Paste CSV text or upload a .csv/.tsv/.txt file

Delimiter

Encoding

First row as header Escape HTML entities Add scope attributes Add CSS classes Zebra striping classes Minify output

Table CSS class

Table ID

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

CSV parsing appears trivial until edge cases surface: quoted fields containing delimiters, escaped double-quotes (""), newlines embedded inside cell values, or mixed encodings. A naive split on commas destroys data integrity. This converter implements a full RFC 4180-compliant state-machine parser that processes input character-by-character across four distinct states. It auto-detects delimiters by frequency analysis and handles UTF-8 BOM markers. Malformed rows are preserved rather than silently dropped.

The generated HTML output is production-ready: entities like &, <, and > are escaped by default to prevent XSS in downstream applications. You control whether the first row becomes a <thead>, whether CSS classes are injected, and whether the table includes scope attributes for accessibility. Limitation: this tool operates entirely client-side. Files exceeding ~50 MB may cause browser memory pressure on low-end devices.

Formulas

The CSV parser operates as a finite state machine with four states. For each character c at position i in the input string of length n, the parser transitions between states:

{

S₀ = FIELD_START → if c = """ goto S₂S₁ = UNQUOTED → if c = delim emit field, goto S₀S₂ = QUOTED → if c = """ goto S₃S₃ = QUOTE_END → if c = """ append quote, goto S₂

Delimiter auto-detection scores each candidate delimiter d across the first k = 5 lines. For each line L_j, count occurrences f(d, L_j). The consistency score is computed as:

score(d) = 1σ(f) + 1 × f(d)

where σ(f) is the standard deviation of occurrence counts across lines and f(d) is the mean count. The delimiter with the highest score is selected. A perfect delimiter has σ = 0 (identical count per line) and high f.

HTML entity escaping applies the replacement map E: for each cell value v, the output is escape(v) = v.replace(& → &).replace(< → <).replace(> → >).replace(" → ").

Reference Data

Delimiter	Character	Common Sources	Auto-Detect Priority
Comma	,	Excel, Google Sheets, most databases	1
Semicolon	;	European Excel (locale-dependent), SAP exports	2
Tab	\t	TSV files, clipboard paste from spreadsheets	3
Pipe	\|	Unix logs, mainframe exports, HL7 messages	4
RFC 4180 Rule	Description	Handling	Status
Rule 1	Each record on separate line	CRLF and LF both supported	✓ Implemented
Rule 2	Optional header line	User toggle for <thead>	✓ Implemented
Rule 3	Fields may be quoted	State-machine quoted field parsing	✓ Implemented
Rule 4	Quotes escaped as ""	Double-quote collapse in parser	✓ Implemented
Rule 5	Fields with delimiters must be quoted	Delimiter inside quotes preserved	✓ Implemented
Rule 6	Fields with newlines must be quoted	Multiline cell support	✓ Implemented
HTML Entity	Character	Escaped Output	Purpose
&	&	&	Prevent entity injection
<	<	<	Prevent tag injection
>	>	>	Prevent tag closure
"	"	"	Prevent attribute breakout
Encoding	BOM Signature	Bytes	Detection
UTF-8	EF BB BF	3 bytes	Auto-stripped
UTF-16 LE	FF FE	2 bytes	Auto-decoded
UTF-16 BE	FE FF	2 bytes	Auto-decoded
ASCII/Latin-1	None	0 bytes	Fallback UTF-8

Frequently Asked Questions

When the parser is in state S₂ (QUOTED), newline characters (both \n and \r\n) are treated as ordinary characters and appended to the current field value. The field only terminates when a closing double-quote is encountered followed by a delimiter or end-of-line. This is compliant with RFC 4180 Rule 6. The resulting HTML table cell will contain the preserved newline, rendered as whitespace in HTML unless you apply CSS white-space: pre-wrap to the table cells.

The converter does not discard or pad rows. If row 3 has 5 fields while the header has 7, the generated HTML will contain a <tr> with only 5 <td> elements. This may cause visual misalignment in the rendered table. The converter reports the row count and column range in the status output so you can identify and fix the source data. For strict validation, check that all rows match the header column count before using the output in production.

Auto-detection analyzes the first 5 lines and selects the single delimiter with the highest consistency score. If your file genuinely uses mixed delimiters (e.g., commas in some rows, tabs in others), the detection will pick whichever is most consistent across those sample lines. In such cases, manually select the correct delimiter. Files with mixed delimiters are technically malformed CSV and should be cleaned at the source.

Entity escaping prevents cross-site scripting (XSS) when the generated HTML is embedded in a web page. If a CSV cell contains <script>, outputting it unescaped would execute arbitrary JavaScript. The converter escapes &, <, >, and " by default. You can disable escaping in the options if you know your data contains intentional HTML markup (e.g., inline formatting tags) and you trust the data source.

The tool runs entirely in the browser using JavaScript string operations. Practical limits depend on available RAM. Files under 10 MB process instantly on modern devices. Files between 10-50 MB may take 1-3 seconds. Files exceeding 50 MB risk triggering browser memory limits, especially on mobile devices with constrained RAM. For very large datasets, consider server-side processing or splitting the CSV into chunks.

The parser checks the first 3 bytes of uploaded files for the UTF-8 BOM signature (EF BB BF). If detected, the BOM is stripped before parsing. UTF-16 LE (FF FE) and UTF-16 BE (FE FF) BOMs are also detected, and the file is decoded using the appropriate TextDecoder encoding. For pasted text, BOM characters at position 0 are stripped automatically. This prevents phantom empty columns or corrupted first-cell values that commonly occur with Excel-exported CSV files.