User Rating 0.0
Total Usage 0 times
Drop CSV file here or click to upload Max 10 MB · UTF-8 encoded
Is this tool helpful?

Your feedback helps us improve.

About

CSV parsing fails silently. A misplaced quote, a delimiter inside a field value, a newline embedded in a cell - any of these corrupts your output if the parser lacks RFC 4180 compliance. Most naive implementations split on commas with regex. That approach breaks the moment a field contains delimiter characters within quoted strings. This tool implements a finite-state machine parser that handles quoted fields, escaped double-quotes (""), and embedded newlines correctly. It auto-detects the delimiter by frequency analysis across the first 5 rows, supporting comma, semicolon, tab, and pipe. Limitations: the parser assumes UTF-8 encoding. Binary-encoded CSV or files exceeding 10 MB are rejected to prevent browser memory issues.

Output is a standard JSON array of objects (keyed by header names) or a nested array of arrays if no headers are specified. Indentation is configurable. Pro tip: if your CSV originates from Excel on European locales, the delimiter is almost certainly a semicolon, not a comma. Validate your delimiter choice before processing large datasets. This tool does not round, coerce, or interpret field values - every value remains a string, preserving data fidelity.

csv to json csv converter json array csv parser data converter csv to json online file converter

Formulas

The CSV parser operates as a finite-state machine with 4 states. For each character c in the input stream, the machine transitions between states to correctly segment fields.

S { FIELD_START, UNQUOTED, QUOTED, QUOTE_IN_QUOTED }

Transition rules:

{
FIELD_START QUOTED if c = """FIELD_START UNQUOTED if c """ c delimQUOTED QUOTE_IN_QUOTED if c = """QUOTE_IN_QUOTED QUOTED if c = """ (escaped quote)QUOTE_IN_QUOTED FIELD_START if c = delim c = "\n"

Delimiter auto-detection counts the frequency of each candidate delimiter across the first n = 5 lines. The candidate with the lowest coefficient of variation (most consistent count per line) is selected:

delim = argmind σ(freqd)freqd

Where σ is the standard deviation of delimiter frequency counts across sampled lines, and freqd is the mean count. A delimiter with zero variance (identical count per row) is strongly preferred. Ties are broken by priority order: comma > semicolon > tab > pipe.

Reference Data

DelimiterSymbolCommon SourceAuto-Detect PatternRFC 4180Notes
Comma,US/UK Excel, Google Sheets exportHighest comma frequencyYes (default)Fails if decimal commas used
Semicolon;European Excel (DE, FR, IT locales)Highest semicolon frequencyExtensionCommon in SAP exports
Tab\tTSV files, database dumpsHighest tab frequencyExtensionRarely appears inside fields
Pipe|Legacy mainframe systems, HL7Highest pipe frequencyExtensionUsed in medical data (HL7v2)
Quoted Field"..."Any source with special chars - YesEncloses fields with delimiters/newlines
Escaped Quote""Fields containing literal quotes - YesTwo consecutive double-quotes = one literal
CRLF Line End\r\nWindows systems - Yes (required)LF-only also accepted by most parsers
LF Line End\nUnix/macOS systems - ExtensionDe facto standard in web contexts
CR Line End\rClassic Mac OS (pre-X) - ExtensionExtremely rare today
BOM Marker\uFEFFUTF-8 files from Windows NotepadFirst byte checkNot specifiedStripped automatically by this tool
Empty Field,,Sparse datasets - YesProduces empty string, not null
Newline in Field"a\nb"Address fields, descriptions - YesMust be inside quotes per RFC 4180
Trailing Delimitera,b,c,Some DB exports - AmbiguousCreates extra empty field per row
Header Row - Most structured exportsFirst row analysisOptionalUsed as JSON object keys
No Header - Raw sensor data, logs - OptionalOutput becomes array of arrays

Frequently Asked Questions

The parser uses a finite-state machine with a dedicated QUOTED state. When it encounters an opening double-quote at field start, all subsequent characters (including delimiters and newlines) are treated as field content until a closing quote is found. A delimiter inside quotes does not trigger field separation. This follows RFC 4180 Section 2, Rule 6.
Per RFC 4180, a literal double-quote inside a quoted field must be escaped by preceding it with another double-quote. For example, the value She said "hello" must be encoded as "She said ""hello""" in CSV. The parser's QUOTE_IN_QUOTED state detects consecutive double-quotes and emits a single literal quote. Unescaped quotes in unquoted fields produce undefined behavior and may corrupt the parse.
CSV is an untyped format. A field containing 007 could be a numeric value 7, a string identifier, or a James Bond reference. Type coercion (converting "true" to boolean true or "42" to integer 42) is a lossy transformation that destroys information. This tool preserves raw string values to maintain data fidelity. Apply type casting downstream in your application logic where the schema is known.
The tool samples the first 5 lines and counts occurrences of each candidate delimiter (comma, semicolon, tab, pipe). The delimiter with the most consistent count across lines (lowest coefficient of variation) wins. Override auto-detection when: your file has fewer than 2 rows, the first rows are atypical (e.g., comment lines), or the file uses a rare delimiter not in the candidate set.
The limit is 10 MB. Browser JavaScript runs on the main thread. Parsing a 50 MB CSV string character-by-character can freeze the UI for several seconds and may exceed available heap memory on mobile devices. For files beyond 10 MB, use a server-side tool or a streaming parser. The 10 MB limit ensures responsive interaction on devices with as little as 2 GB RAM.
If a row has fewer fields than the header row, the missing fields are filled with empty strings in the JSON output. If a row has more fields than headers, the extra fields are assigned auto-generated keys (column_N where N is the 1-based index). This prevents data loss while keeping the output structurally valid. A warning toast is displayed when row widths are inconsistent.
Yes. Windows applications (notably Notepad) prepend a 3-byte BOM (EF BB BF, or U+FEFF) to UTF-8 files. If not stripped, this invisible character becomes part of the first field value or the first header name, causing silent key mismatch bugs in downstream JSON consumers. This tool automatically detects and removes BOM characters before parsing.