CSV to XML/JSON Converter
Convert CSV files to XML or JSON format online. Supports custom delimiters, RFC 4180, drag & drop upload, auto-detection, and instant download.
About
Malformed CSV parsing causes silent data corruption. A mishandled quoted field containing a comma splits one column into two, cascading errors across every downstream row. This converter implements a full RFC 4180-compliant finite state machine parser that correctly resolves quoted fields, escaped double-quotes ("" → "), and embedded newlines. It auto-detects delimiters (comma, semicolon, tab, pipe) by frequency analysis across the first 5 lines. Output formats include well-formed XML with entity-escaped special characters (&, <, >) and JSON with configurable indentation.
Limitations: the tool assumes UTF-8 encoding. Files with BOM markers are stripped automatically. Nested or hierarchical CSV structures (parent-child relationships) are flattened to single-depth objects. For XML output, column headers are sanitized to valid XML element names - spaces become underscores, leading digits are prefixed with an underscore. Maximum recommended file size is 50 MB; larger files may cause browser memory pressure.
Formulas
The CSV parser operates as a finite state machine with 4 states. For each character c at position i, the transition function δ determines the next state:
where S = {FIELD_START, UNQUOTED, QUOTED, QUOTE_IN_QUOTED} and Σ is the input alphabet (all UTF-8 characters).
Delimiter auto-detection calculates a consistency score C for each candidate delimiter d across the first n lines:
where σ is the standard deviation of delimiter counts per line and is the mean count. The delimiter with the highest C and mean count ≥ 1 is selected. For XML output, every text node value v undergoes entity replacement: v → escape(v) where escape maps & → &, < → <, > → >, " → ", ' → '.
Reference Data
| Delimiter | Character | Common Use | Auto-Detect Pattern |
|---|---|---|---|
| Comma | , | Standard CSV (RFC 4180) | Highest consistent count per line |
| Semicolon | ; | European locales (decimal comma conflict) | Fallback when comma count is inconsistent |
| Tab | \t | TSV files, database exports | Detected if tab count ≥ 1 per line |
| Pipe | | | Legacy systems, mainframe exports | Detected if pipe count is consistent |
| Double Quote | " | Field enclosure (RFC 4180) | N/A (enclosure, not delimiter) |
| Escaped Quote | "" | Literal quote inside quoted field | N/A (escape sequence) |
| CRLF | \r\n | Windows line ending | Normalized to \n internally |
| LF | \n | Unix/macOS line ending | Primary line break |
| BOM | \uFEFF | UTF-8 Byte Order Mark | Stripped if found at position 0 |
| XML Entity: & | & | Escaped in XML output | All 5 XML entities handled |
| XML Entity: < | < | Escaped in XML output | Prevents tag injection |
| XML Entity: > | > | Escaped in XML output | Prevents tag injection |
| XML Entity: " | " | Escaped in XML attributes | Attribute-safe encoding |
| XML Entity: ' | ' | Escaped in XML attributes | Attribute-safe encoding |
| JSON Indent: 2 | Spaces | Standard readable JSON | Default setting |
| JSON Indent: 4 | Spaces | Verbose readable JSON | Optional setting |
| JSON Indent: Tab | \t | Tab-indented JSON | Optional setting |
| JSON Compact | None | Minified JSON (no whitespace) | Smallest file size |
| Max Safe Rows | 500,000 | Browser memory limit (~50 MB) | Warning shown above limit |
| RFC 4180 | Standard | Formal CSV specification | Full compliance implemented |