User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Paste CSV text or upload a .csv/.tsv/.txt file
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

CSV parsing appears trivial until edge cases surface: quoted fields containing delimiters, escaped double-quotes (""), newlines embedded inside cell values, or mixed encodings. A naive split on commas destroys data integrity. This converter implements a full RFC 4180-compliant state-machine parser that processes input character-by-character across four distinct states. It auto-detects delimiters by frequency analysis and handles UTF-8 BOM markers. Malformed rows are preserved rather than silently dropped.

The generated HTML output is production-ready: entities like &, <, and > are escaped by default to prevent XSS in downstream applications. You control whether the first row becomes a <thead>, whether CSS classes are injected, and whether the table includes scope attributes for accessibility. Limitation: this tool operates entirely client-side. Files exceeding ~50 MB may cause browser memory pressure on low-end devices.

csv to html csv converter html table generator csv parser data conversion table converter

Formulas

The CSV parser operates as a finite state machine with four states. For each character c at position i in the input string of length n, the parser transitions between states:

{
S0 = FIELD_START โ†’ if c = """ goto S2S1 = UNQUOTED โ†’ if c = delim emit field, goto S0S2 = QUOTED โ†’ if c = """ goto S3S3 = QUOTE_END โ†’ if c = """ append quote, goto S2

Delimiter auto-detection scores each candidate delimiter d across the first k = 5 lines. For each line Lj, count occurrences f(d, Lj). The consistency score is computed as:

score(d) = 1ฯƒ(f) + 1 ร— f(d)

where ฯƒ(f) is the standard deviation of occurrence counts across lines and f(d) is the mean count. The delimiter with the highest score is selected. A perfect delimiter has ฯƒ = 0 (identical count per line) and high f.

HTML entity escaping applies the replacement map E: for each cell value v, the output is escape(v) = v.replace(& โ†’ &).replace(< โ†’ <).replace(> โ†’ >).replace(" โ†’ ").

Reference Data

DelimiterCharacterCommon SourcesAuto-Detect Priority
Comma,Excel, Google Sheets, most databases1
Semicolon;European Excel (locale-dependent), SAP exports2
Tab\tTSV files, clipboard paste from spreadsheets3
Pipe|Unix logs, mainframe exports, HL7 messages4
RFC 4180 RuleDescriptionHandlingStatus
Rule 1Each record on separate lineCRLF and LF both supportedโœ“ Implemented
Rule 2Optional header lineUser toggle for <thead>โœ“ Implemented
Rule 3Fields may be quotedState-machine quoted field parsingโœ“ Implemented
Rule 4Quotes escaped as ""Double-quote collapse in parserโœ“ Implemented
Rule 5Fields with delimiters must be quotedDelimiter inside quotes preservedโœ“ Implemented
Rule 6Fields with newlines must be quotedMultiline cell supportโœ“ Implemented
HTML EntityCharacterEscaped OutputPurpose
&&&Prevent entity injection
<<<Prevent tag injection
>>>Prevent tag closure
"""Prevent attribute breakout
EncodingBOM SignatureBytesDetection
UTF-8EF BB BF3 bytesAuto-stripped
UTF-16 LEFF FE2 bytesAuto-decoded
UTF-16 BEFE FF2 bytesAuto-decoded
ASCII/Latin-1None0 bytesFallback UTF-8

Frequently Asked Questions

When the parser is in state Sโ‚‚ (QUOTED), newline characters (both \n and \r\n) are treated as ordinary characters and appended to the current field value. The field only terminates when a closing double-quote is encountered followed by a delimiter or end-of-line. This is compliant with RFC 4180 Rule 6. The resulting HTML table cell will contain the preserved newline, rendered as whitespace in HTML unless you apply CSS white-space: pre-wrap to the table cells.
The converter does not discard or pad rows. If row 3 has 5 fields while the header has 7, the generated HTML will contain a <tr> with only 5 <td> elements. This may cause visual misalignment in the rendered table. The converter reports the row count and column range in the status output so you can identify and fix the source data. For strict validation, check that all rows match the header column count before using the output in production.
Auto-detection analyzes the first 5 lines and selects the single delimiter with the highest consistency score. If your file genuinely uses mixed delimiters (e.g., commas in some rows, tabs in others), the detection will pick whichever is most consistent across those sample lines. In such cases, manually select the correct delimiter. Files with mixed delimiters are technically malformed CSV and should be cleaned at the source.
Entity escaping prevents cross-site scripting (XSS) when the generated HTML is embedded in a web page. If a CSV cell contains <script>, outputting it unescaped would execute arbitrary JavaScript. The converter escapes &, <, >, and " by default. You can disable escaping in the options if you know your data contains intentional HTML markup (e.g., inline formatting tags) and you trust the data source.
The tool runs entirely in the browser using JavaScript string operations. Practical limits depend on available RAM. Files under 10 MB process instantly on modern devices. Files between 10-50 MB may take 1-3 seconds. Files exceeding 50 MB risk triggering browser memory limits, especially on mobile devices with constrained RAM. For very large datasets, consider server-side processing or splitting the CSV into chunks.
The parser checks the first 3 bytes of uploaded files for the UTF-8 BOM signature (EF BB BF). If detected, the BOM is stripped before parsing. UTF-16 LE (FF FE) and UTF-16 BE (FE FF) BOMs are also detected, and the file is decoded using the appropriate TextDecoder encoding. For pasted text, BOM characters at position 0 are stripped automatically. This prevents phantom empty columns or corrupted first-cell values that commonly occur with Excel-exported CSV files.