User Rating 0.0
Total Usage 0 times
Drop CSV file here or click to browse Supports .csv, .tsv, .txt up to 50 MB
Is this tool helpful?

Your feedback helps us improve.

About

Malformed CSV parsing causes silent data corruption. A mishandled quoted field containing a comma splits one column into two, cascading errors across every downstream row. This converter implements a full RFC 4180-compliant finite state machine parser that correctly resolves quoted fields, escaped double-quotes ("""), and embedded newlines. It auto-detects delimiters (comma, semicolon, tab, pipe) by frequency analysis across the first 5 lines. Output formats include well-formed XML with entity-escaped special characters (&, <, >) and JSON with configurable indentation.

Limitations: the tool assumes UTF-8 encoding. Files with BOM markers are stripped automatically. Nested or hierarchical CSV structures (parent-child relationships) are flattened to single-depth objects. For XML output, column headers are sanitized to valid XML element names - spaces become underscores, leading digits are prefixed with an underscore. Maximum recommended file size is 50 MB; larger files may cause browser memory pressure.

csv to json csv to xml csv converter data converter file converter csv parser json generator xml generator

Formulas

The CSV parser operates as a finite state machine with 4 states. For each character c at position i, the transition function δ determines the next state:

δ: S × Σ S

where S = {FIELD_START, UNQUOTED, QUOTED, QUOTE_IN_QUOTED} and Σ is the input alphabet (all UTF-8 characters).

{
QUOTED if state = FIELD_START c = "UNQUOTED if state = FIELD_START c "QUOTE_IN_QUOTED if state = QUOTED c = "QUOTED if state = QUOTE_IN_QUOTED c = " (escaped quote)

Delimiter auto-detection calculates a consistency score C for each candidate delimiter d across the first n lines:

Cd = 1σ(countsd) countd

where σ is the standard deviation of delimiter counts per line and count is the mean count. The delimiter with the highest C and mean count 1 is selected. For XML output, every text node value v undergoes entity replacement: v escape(v) where escape maps &&, <<, >>, "", ''.

Reference Data

DelimiterCharacterCommon UseAuto-Detect Pattern
Comma,Standard CSV (RFC 4180)Highest consistent count per line
Semicolon;European locales (decimal comma conflict)Fallback when comma count is inconsistent
Tab\tTSV files, database exportsDetected if tab count ≥ 1 per line
Pipe|Legacy systems, mainframe exportsDetected if pipe count is consistent
Double Quote"Field enclosure (RFC 4180)N/A (enclosure, not delimiter)
Escaped Quote""Literal quote inside quoted fieldN/A (escape sequence)
CRLF\r\nWindows line endingNormalized to \n internally
LF\nUnix/macOS line endingPrimary line break
BOM\uFEFFUTF-8 Byte Order MarkStripped if found at position 0
XML Entity: &&Escaped in XML outputAll 5 XML entities handled
XML Entity: <<Escaped in XML outputPrevents tag injection
XML Entity: >>Escaped in XML outputPrevents tag injection
XML Entity: ""Escaped in XML attributesAttribute-safe encoding
XML Entity: ''Escaped in XML attributesAttribute-safe encoding
JSON Indent: 2SpacesStandard readable JSONDefault setting
JSON Indent: 4SpacesVerbose readable JSONOptional setting
JSON Indent: Tab\tTab-indented JSONOptional setting
JSON CompactNoneMinified JSON (no whitespace)Smallest file size
Max Safe Rows500,000Browser memory limit (~50 MB)Warning shown above limit
RFC 4180StandardFormal CSV specificationFull compliance implemented

Frequently Asked Questions

The parser scans the first 5 lines and counts occurrences of each candidate delimiter (comma, semicolon, tab, pipe) per line. It calculates a consistency score by dividing the mean count by the standard deviation. A true delimiter appears the same number of times on each line (low deviation), while a character appearing in field values shows irregular counts. The delimiter with the highest consistency score and a mean count of at least 1 is selected. If all candidates score equally, comma is used as the RFC 4180 default.
Per RFC 4180, any field containing the delimiter, a newline, or a double quote must be enclosed in double quotes. The parser's QUOTED state consumes all characters - including delimiters and newlines - until it encounters a closing double quote. A literal double quote inside a quoted field is represented as two consecutive double quotes (""), which the parser collapses to a single quote character in the output. This means a field like "Smith, John" correctly parses as a single value Smith, John rather than splitting into two columns.
XML element names must start with a letter or underscore and contain only letters, digits, hyphens, underscores, and periods. The converter applies these transformations: spaces and special characters are replaced with underscores, leading digits are prefixed with an underscore (e.g., 3rd_Quarter becomes _3rd_Quarter), empty headers receive a generic name column_N where N is the column index, and consecutive invalid characters are collapsed to a single underscore. This ensures the output is always well-formed XML that passes validation.
By default, all CSV values are strings. When the "Detect Types" option is enabled, the converter attempts to parse each value: numeric strings (matching the pattern /^-?\d+(\.\d+)?([eE][+-]?\d+)?$/) become JSON numbers, the literals true and false (case-insensitive) become JSON booleans, empty fields become null, and everything else remains a string. This heuristic covers most cases but may misinterpret values like zip codes (07001) or phone numbers. Disable type detection if numeric-looking strings must stay as strings.
The converter processes files up to 50 MB using a Web Worker to avoid blocking the main thread. For files under 50 KB, parsing runs on the main thread for faster response. Between 50 KB and 50 MB, the Web Worker handles parsing with progress updates. Above 50 MB, browser memory pressure may cause crashes depending on available RAM. A 10 MB CSV with 200,000 rows typically converts in under 3 seconds on modern hardware. The output file (especially XML) can be 3-5x larger than the input due to tag overhead.
Yes. If a row has fewer fields than the header row, the missing fields are filled with empty strings (JSON) or empty elements (XML). If a row has more fields than the header, the extra fields are assigned generated column names (extra_1, extra_2, etc.) in the output. A warning toast notification appears indicating the row numbers with mismatched column counts so you can verify the source data integrity before using the output.