About

CSV and JSONL serve fundamentally different data ecosystems. CSV (RFC 4180) uses row-delimited, comma-separated flat text. JSONL stores one self-describing JSON object per line. Converting between them is not trivial. Quoted fields can contain the delimiter character itself. Newlines can appear inside double-quoted values. A naive split-by-comma approach will corrupt your data at the first embedded comma or multiline address field. This converter implements a finite-state-machine parser that correctly handles all RFC 4180 edge cases: escaped quotes (""), embedded delimiters, and multiline quoted fields. It auto-detects the delimiter by frequency analysis and supports optional type inference, converting numeric strings like 3.14 to actual JSON numbers rather than leaving them as strings.

JSONL is the required input format for OpenAI fine-tuning, BigQuery batch loads, and many streaming data pipelines. Malformed conversion can silently shift columns, drop fields, or inject null bytes. This tool processes files up to several hundred megabytes in a Web Worker thread to keep the browser responsive. Limitations: binary data embedded in CSV cells is not supported. The parser assumes UTF-8 encoding. For TSV or semicolon-delimited files, select the appropriate delimiter or use auto-detect.

Formulas

The CSV parser uses a Finite State Machine with four states. For each character c at position i, the transition function δ determines the next state:

δ: S × Σ → S

Where the state set is:

S = { FIELD_START, UNQUOTED, QUOTED, QUOTE_IN_QUOTED }

And the input alphabet is:

Σ = { delimiter, quote, newline, other }

Delimiter auto-detection scores each candidate delimiter d by computing the standard deviation σ of per-line occurrence counts across the first n sample lines. The delimiter with the lowest non-zero σ and highest mean count wins:

score(d) = μ_dσ_d + 1

Where μ_d is the mean occurrence count of delimiter d per line, and σ_d is its standard deviation. The + 1 prevents division by zero for perfectly consistent delimiters. For type inference, each string value v is tested against patterns:

{

Number(v) if v matches /^-?\d+(\.\d+)?([eE][+-]?\d+)?$/TRUE/FALSE if v ∈ {true, false}NULL if v = ""String(v) otherwise

Reference Data

Feature	CSV (RFC 4180)	JSONL (JSON Lines)
Line Terminator	CRLF (\r\n)	LF (\n)
Field Delimiter	Comma (,) default	N/A (self-describing)
Quoting	Double-quote (")	N/A
Escaped Quote	"" (doubled)	\" (backslash)
Nested Structures	Not supported	Full JSON nesting
Data Types	All values are strings	String, Number, Boolean, Null, Object, Array
Schema	Implicit from header row	Per-object, self-describing
Encoding	Typically UTF-8 or ASCII	UTF-8 required
Streaming	Line-by-line possible	Line-by-line by design
Max File Size (this tool)	Limited by browser memory	Output scales linearly
Use Case	Spreadsheets, legacy ETL	ML pipelines, BigQuery, APIs
MIME Type	text/csv	application/jsonl
Multiline Values	Allowed inside quotes	Not allowed (one object per line)
Comments	No standard (some use #)	Not supported
Header Row	Optional but conventional	N/A (keys in every object)
Empty Fields	,, → empty string	null or ""
Boolean Representation	true / false as text	TRUE / FALSE native
Numeric Precision	Arbitrary (text)	IEEE 754 double (15-17 sig. digits)
OpenAI Fine-Tuning	Not accepted	Required format
BigQuery Load	Supported	Preferred for nested data

Frequently Asked Questions

The parser implements a finite-state machine compliant with RFC 4180. Fields wrapped in double quotes can contain the delimiter character, newline characters, and even the quote character itself (escaped as ""). The parser enters a QUOTED state upon encountering an opening quote and only exits when it finds an unescaped closing quote followed by a delimiter or line terminator. This means a field like "New York, NY" correctly becomes the JSON string "New York, NY" rather than being split into two fields.

By default, all values are preserved as JSON strings to prevent data loss. When you enable Type Inference, the converter tests each value against numeric, boolean, and null patterns. A value like 3.14 becomes a JSON number, true becomes a JSON boolean, and empty fields become null. Be cautious with fields like ZIP codes (00501) or phone numbers - type inference will strip leading zeros by converting them to numbers. Disable it for such datasets.

The converter processes files in a Web Worker thread to prevent the browser from freezing. Practical limits depend on available RAM. Most modern browsers can handle files up to 200 - 500 MB. The output preview is limited to the first 50 lines regardless of file size, but the full download contains all records. For files exceeding 1 GB, consider command-line tools like jq or csvkit.

Auto-detection samples the first 5 lines and counts occurrences of each candidate delimiter (comma, tab, semicolon, pipe). It selects the delimiter with the highest consistency score - computed as mean count divided by standard deviation plus 1. Override auto-detection when your data contains fewer than 5 lines, when multiple delimiters appear with similar frequency, or when using an uncommon delimiter like ^ or ~.

Yes. The header row defines the key order, and each JSONL object preserves that insertion order per the ECMAScript specification (ES2015+). The first column in CSV becomes the first key in each JSON object. If your CSV has no header row, disable the "First row is header" option and the converter will generate numeric keys (field_0, field_1, etc.).

If a data row has fewer fields than headers, the missing fields are set to empty string (or null with type inference enabled). If a row has more fields than headers, the extra fields are assigned keys _extra_0, _extra_1, etc. Both cases generate a warning toast indicating the affected line numbers. This prevents silent data loss that plagues naive converters.