User Rating 0.0
Total Usage 0 times
Drop a CSV file here or click to browse Supports .csv, .tsv, .txt
JSONL Output
Is this tool helpful?

Your feedback helps us improve.

About

CSV and JSONL serve fundamentally different data ecosystems. CSV (RFC 4180) uses row-delimited, comma-separated flat text. JSONL stores one self-describing JSON object per line. Converting between them is not trivial. Quoted fields can contain the delimiter character itself. Newlines can appear inside double-quoted values. A naive split-by-comma approach will corrupt your data at the first embedded comma or multiline address field. This converter implements a finite-state-machine parser that correctly handles all RFC 4180 edge cases: escaped quotes (""), embedded delimiters, and multiline quoted fields. It auto-detects the delimiter by frequency analysis and supports optional type inference, converting numeric strings like 3.14 to actual JSON numbers rather than leaving them as strings.

JSONL is the required input format for OpenAI fine-tuning, BigQuery batch loads, and many streaming data pipelines. Malformed conversion can silently shift columns, drop fields, or inject null bytes. This tool processes files up to several hundred megabytes in a Web Worker thread to keep the browser responsive. Limitations: binary data embedded in CSV cells is not supported. The parser assumes UTF-8 encoding. For TSV or semicolon-delimited files, select the appropriate delimiter or use auto-detect.

csv to jsonl csv converter json lines data conversion csv parser jsonl format file converter

Formulas

The CSV parser uses a Finite State Machine with four states. For each character c at position i, the transition function δ determines the next state:

δ: S × Σ S

Where the state set is:

S = { FIELD_START, UNQUOTED, QUOTED, QUOTE_IN_QUOTED }

And the input alphabet is:

Σ = { delimiter, quote, newline, other }

Delimiter auto-detection scores each candidate delimiter d by computing the standard deviation σ of per-line occurrence counts across the first n sample lines. The delimiter with the lowest non-zero σ and highest mean count wins:

score(d) = μdσd + 1

Where μd is the mean occurrence count of delimiter d per line, and σd is its standard deviation. The + 1 prevents division by zero for perfectly consistent delimiters. For type inference, each string value v is tested against patterns:

{
Number(v) if v matches /^-?\d+(\.\d+)?([eE][+-]?\d+)?$/TRUE/FALSE if v {true, false}NULL if v = ""String(v) otherwise

Reference Data

FeatureCSV (RFC 4180)JSONL (JSON Lines)
Line TerminatorCRLF (\r\n)LF (\n)
Field DelimiterComma (,) defaultN/A (self-describing)
QuotingDouble-quote (")N/A
Escaped Quote"" (doubled)\" (backslash)
Nested StructuresNot supportedFull JSON nesting
Data TypesAll values are stringsString, Number, Boolean, Null, Object, Array
SchemaImplicit from header rowPer-object, self-describing
EncodingTypically UTF-8 or ASCIIUTF-8 required
StreamingLine-by-line possibleLine-by-line by design
Max File Size (this tool)Limited by browser memoryOutput scales linearly
Use CaseSpreadsheets, legacy ETLML pipelines, BigQuery, APIs
MIME Typetext/csvapplication/jsonl
Multiline ValuesAllowed inside quotesNot allowed (one object per line)
CommentsNo standard (some use #)Not supported
Header RowOptional but conventionalN/A (keys in every object)
Empty Fields,, → empty stringnull or ""
Boolean Representationtrue / false as textTRUE / FALSE native
Numeric PrecisionArbitrary (text)IEEE 754 double (15-17 sig. digits)
OpenAI Fine-TuningNot acceptedRequired format
BigQuery LoadSupportedPreferred for nested data

Frequently Asked Questions

The parser implements a finite-state machine compliant with RFC 4180. Fields wrapped in double quotes can contain the delimiter character, newline characters, and even the quote character itself (escaped as ""). The parser enters a QUOTED state upon encountering an opening quote and only exits when it finds an unescaped closing quote followed by a delimiter or line terminator. This means a field like "New York, NY" correctly becomes the JSON string "New York, NY" rather than being split into two fields.
By default, all values are preserved as JSON strings to prevent data loss. When you enable Type Inference, the converter tests each value against numeric, boolean, and null patterns. A value like 3.14 becomes a JSON number, true becomes a JSON boolean, and empty fields become null. Be cautious with fields like ZIP codes (00501) or phone numbers - type inference will strip leading zeros by converting them to numbers. Disable it for such datasets.
The converter processes files in a Web Worker thread to prevent the browser from freezing. Practical limits depend on available RAM. Most modern browsers can handle files up to 200 - 500 MB. The output preview is limited to the first 50 lines regardless of file size, but the full download contains all records. For files exceeding 1 GB, consider command-line tools like jq or csvkit.
Auto-detection samples the first 5 lines and counts occurrences of each candidate delimiter (comma, tab, semicolon, pipe). It selects the delimiter with the highest consistency score - computed as mean count divided by standard deviation plus 1. Override auto-detection when your data contains fewer than 5 lines, when multiple delimiters appear with similar frequency, or when using an uncommon delimiter like ^ or ~.
Yes. The header row defines the key order, and each JSONL object preserves that insertion order per the ECMAScript specification (ES2015+). The first column in CSV becomes the first key in each JSON object. If your CSV has no header row, disable the "First row is header" option and the converter will generate numeric keys (field_0, field_1, etc.).
If a data row has fewer fields than headers, the missing fields are set to empty string (or null with type inference enabled). If a row has more fields than headers, the extra fields are assigned keys _extra_0, _extra_1, etc. Both cases generate a warning toast indicating the affected line numbers. This prevents silent data loss that plagues naive converters.