About

CSV files encode tabular data as comma-delimited rows. Extracting a single column into a usable list format requires correct parsing of quoted fields, escaped delimiters, and line-break variants (CRLF vs LF). A naive split on commas fails when fields contain literal commas wrapped in double quotes - a pattern defined by RFC 4180. This tool implements a compliant state-machine parser that handles quoted fields, embedded newlines, and escaped quotes ("") before converting selected column data into your target format. It supports HTML ordered and unordered lists, Markdown, numbered plain text, and JSON arrays.

Incorrect CSV extraction leads to truncated data, merged fields, or broken markup. This matters when generating navigation menus from spreadsheet exports, populating CMS content, or feeding cleaned data into scripts. The tool preserves original field values without mutation. Limitation: files exceeding 50MB may cause browser memory pressure. For CSVs with non-UTF-8 encoding, convert to UTF-8 first.

Formulas

The CSV parser operates as a finite state machine with three states: S₀ (field start), S₁ (inside quoted field), and S₂ (inside unquoted field). Transitions are governed by the current character c and the configured delimiter d.

{

S₀ → S₁ if c = """S₀ → S₂ if c ≠ d ∧ c ≠ "\n"S₁ → S₀ if c = """ ∧ c_next ≠ """S₂ → S₀ if c = d ∨ c = "\n"

The total number of list items N extracted from a CSV with R data rows (excluding header if toggled) and column index j:

N = R∑i=1 valid(row_i[j])

Where valid(field) returns 1 if the field is non-empty after trimming, 0 otherwise (when "skip empty" is enabled). d = input delimiter character. R = total data rows. j = zero-indexed column selection.

Reference Data

Output Format	Syntax Example	Use Case	Supports Nesting	Machine-Readable
HTML Unordered List	`<ul><li>Item</li></ul>`	Web pages, CMS content	Yes	Yes (DOM)
HTML Ordered List	`<ol><li>Item</li></ol>`	Ranked lists, steps	Yes	Yes (DOM)
Markdown Bulleted	`- Item`	README files, docs	Yes (indent)	No
Markdown Numbered	`1. Item`	Procedures, rankings	Yes (indent)	No
Plain Bulleted	`• Item`	Emails, notes	No	No
Plain Numbered	`1. Item`	Task lists	No	No
Comma-Separated	`A, B, C`	Inline references	No	No
JSON Array	`["A","B"]`	API payloads, configs	Yes	Yes
Line-per-Item	`Item Item`	Seed files, imports	No	Partially
Custom Prefix	`→ Item`	Custom docs	No	No
SQL Values	`('A'),('B')`	Database inserts	No	Yes
XML List	`<item>A</item>`	Config files, SOAP	Yes	Yes
CSV Delimiter	Meaning	Notes
`,` (Comma)	Field separator	Most common. RFC 4180 standard.
`;` (Semicolon)	Field separator	Common in European locales where comma is decimal separator.
`\t` (Tab)	Field separator	TSV format. Avoids quoting issues.
`\|` (Pipe)	Field separator	Used in legacy systems and log files.
`""`	Escaped quote	A literal double quote inside a quoted field per RFC 4180.
BOM (`\uFEFF`)	Byte Order Mark	Invisible prefix in UTF-8 files from Excel. Must be stripped.

Frequently Asked Questions

Per RFC 4180, any field containing the delimiter character, a newline, or a double quote must be enclosed in double quotes. The parser enters a quoted-field state (S₁) upon encountering an opening quote and only exits when it finds a closing quote not followed by another quote. A sequence "" inside a quoted field is interpreted as a literal double-quote character. This means a field like "New York, NY" is correctly parsed as a single value: New York, NY.

The tool determines column count from the first row (header or data). Rows with fewer columns than expected are padded with empty strings. Rows with more columns than expected retain all fields, but only the selected column index is extracted. If the selected column index exceeds a short row's field count, that row produces an empty value, which is either included as blank or skipped depending on the "Skip empty items" setting.

Yes. The tool supports four input delimiters: comma (,), semicolon (;), tab (\t), and pipe (|). European-locale spreadsheets (Excel on German/French systems) often export with semicolons because the comma serves as the decimal separator. Select the matching delimiter before parsing. The auto-detect feature examines the first 5 rows and selects the delimiter with the most consistent occurrence count.

Yes. Files exported from Excel on Windows often begin with a Byte Order Mark (\uFEFF), an invisible character at position 0. The parser strips this character before processing. If not removed, it would prepend to the first field of the first row, corrupting the header name and causing column selection failures.

The tool processes files up to approximately 50MB directly in the browser. Files between 5MB and 50MB are parsed using a Web Worker to avoid blocking the UI thread. Beyond 50MB, browser memory constraints may cause tab crashes. For such files, consider splitting them with a command-line tool like split before processing.

The JSON output uses JSON.stringify() on each extracted value, which correctly escapes double quotes (\"), backslashes (\\), newlines (\n), tabs (\t), and Unicode control characters. The resulting array is valid JSON that can be directly parsed by any compliant JSON parser.