CSV to HTML Converter
Convert CSV files to clean HTML tables instantly. Supports custom delimiters, header rows, styling options, and RFC 4180 parsing. Copy or download results.
About
CSV parsing appears trivial until edge cases surface: quoted fields containing delimiters, escaped double-quotes (""), newlines embedded inside cell values, or mixed encodings. A naive split on commas destroys data integrity. This converter implements a full RFC 4180-compliant state-machine parser that processes input character-by-character across four distinct states. It auto-detects delimiters by frequency analysis and handles UTF-8 BOM markers. Malformed rows are preserved rather than silently dropped.
The generated HTML output is production-ready: entities like &, <, and > are escaped by default to prevent XSS in downstream applications. You control whether the first row becomes a <thead>, whether CSS classes are injected, and whether the table includes scope attributes for accessibility. Limitation: this tool operates entirely client-side. Files exceeding ~50 MB may cause browser memory pressure on low-end devices.
Formulas
The CSV parser operates as a finite state machine with four states. For each character c at position i in the input string of length n, the parser transitions between states:
Delimiter auto-detection scores each candidate delimiter d across the first k = 5 lines. For each line Lj, count occurrences f(d, Lj). The consistency score is computed as:
where ฯ(f) is the standard deviation of occurrence counts across lines and (d) is the mean count. The delimiter with the highest score is selected. A perfect delimiter has ฯ = 0 (identical count per line) and high .
HTML entity escaping applies the replacement map E: for each cell value v, the output is escape(v) = v.replace(& โ &).replace(< โ <).replace(> โ >).replace(" โ ").
Reference Data
| Delimiter | Character | Common Sources | Auto-Detect Priority |
|---|---|---|---|
| Comma | , | Excel, Google Sheets, most databases | 1 |
| Semicolon | ; | European Excel (locale-dependent), SAP exports | 2 |
| Tab | \t | TSV files, clipboard paste from spreadsheets | 3 |
| Pipe | | | Unix logs, mainframe exports, HL7 messages | 4 |
| RFC 4180 Rule | Description | Handling | Status |
| Rule 1 | Each record on separate line | CRLF and LF both supported | โ Implemented |
| Rule 2 | Optional header line | User toggle for <thead> | โ Implemented |
| Rule 3 | Fields may be quoted | State-machine quoted field parsing | โ Implemented |
| Rule 4 | Quotes escaped as "" | Double-quote collapse in parser | โ Implemented |
| Rule 5 | Fields with delimiters must be quoted | Delimiter inside quotes preserved | โ Implemented |
| Rule 6 | Fields with newlines must be quoted | Multiline cell support | โ Implemented |
| HTML Entity | Character | Escaped Output | Purpose |
| & | & | & | Prevent entity injection |
| < | < | < | Prevent tag injection |
| > | > | > | Prevent tag closure |
| " | " | " | Prevent attribute breakout |
| Encoding | BOM Signature | Bytes | Detection |
| UTF-8 | EF BB BF | 3 bytes | Auto-stripped |
| UTF-16 LE | FF FE | 2 bytes | Auto-decoded |
| UTF-16 BE | FE FF | 2 bytes | Auto-decoded |
| ASCII/Latin-1 | None | 0 bytes | Fallback UTF-8 |