About

Extracting a single column from a CSV file and reformatting it as a flat list is a frequent task in data migration, SQL query building, and API payload construction. Manual copy-paste from spreadsheets introduces invisible characters, broken encodings, and missed rows. This tool parses CSV input according to RFC 4180 rules, correctly handling quoted fields that contain embedded commas or newlines. It auto-detects the delimiter (comma, semicolon, tab, or pipe) by frequency analysis across the first 5 rows, then presents every detected column for selection.

The output list supports configurable item separators (sep), optional quoting wrappers, prefix and suffix strings, and deduplication. Note: the parser assumes UTF-8 encoding. Files with BOM markers are stripped automatically. For files exceeding 10 MB, consider splitting before upload. Pro Tip: when building SQL IN clauses, set the wrapper to single quotes and separator to comma to get a ready-to-paste value list.

Formulas

The delimiter auto-detection algorithm scores each candidate delimiter by counting occurrences per line and measuring consistency:

score_d = 11 + σ_d × n_d

where d = candidate delimiter, n_d = mean count of d per line across the sample rows, and σ_d = standard deviation of counts per line. A low σ (consistent count across rows) and high n (frequent occurrence) yields the highest score. The delimiter with the maximum score is selected.

The RFC 4180 field parser follows this state machine: if a field begins with a double-quote character ("), all characters until the next unescaped " are captured, where "" is treated as a literal embedded quote. Fields not beginning with " terminate at the next delimiter or line break. The deduplication pass, when enabled, uses a Set structure with O(1) average lookup per item, yielding total complexity O(n) for n rows.

Reference Data

Delimiter	Common Name	Auto-Detect Symbol	Typical Source	RFC/Standard
,	Comma	,	Excel, Google Sheets (EN)	RFC 4180
;	Semicolon	;	Excel (EU locales), SAP exports	No formal RFC
\t	Tab (TSV)	⇥	Database dumps, Unix utilities	IANA TSV
\|	Pipe	\|	Legacy mainframe, HL7 data	HL7 v2.x
:	Colon	:	/etc/passwd, config files	POSIX
Output Wrapper Options
None	Plain value	`value`	Plain text lists	-
Single Quotes	SQL-safe	`"value"`	SQL IN clauses	SQL-92
Double Quotes	JSON-safe	`"value"`	JSON arrays, CSV re-export	RFC 8259
Backticks	MySQL identifiers	`value`	MySQL column/table names	MySQL dialect
Parentheses	Grouped	`(value)`	Mathematical notation	-
Brackets	Array-like	`[value]`	Markdown, Wiki syntax	-
Common Separator Patterns
Comma + Space	`,`	Inline lists	English prose, logs	-
Newline	`\n`	One item per line	Bulk import files	-
Pipe	`\|`	Table cells	Markdown tables	GFM
Semicolon	`;`	Multi-value fields	vCard, iCal	RFC 6350
Tab	`\t`	Column-aligned	Spreadsheet paste	-

Frequently Asked Questions

The algorithm samples the first 5 non-empty lines and scores each candidate delimiter (comma, semicolon, tab, pipe) based on mean frequency and consistency (low standard deviation). If a file mixes delimiters, the one with the most consistent per-line count wins. You can always override the auto-detected choice by selecting a delimiter manually from the dropdown.

Yes. The parser follows RFC 4180 rules. A field wrapped in double quotes can contain commas, newlines, and literal double-quote characters (escaped as two consecutive double quotes). For example, the field ""Smith, John"" is parsed as a single value "Smith, John".

The tool uses the header row (first row) to determine the column count. If a subsequent row has fewer fields, missing columns return empty strings. If a row has more fields than the header, the extra fields are ignored. A warning toast is displayed when inconsistencies are detected.

The current tool extracts one column at a time to produce a clean flat list. To extract multiple columns, run the conversion once per column. Each result can be copied or downloaded independently.

The tool accepts files up to 10 MB. Input is read as UTF-8 text. A UTF-8 BOM (byte order mark, 0xEF 0xBB 0xBF) at the start of the file is stripped automatically. Files in other encodings (e.g., ISO-8859-1) may produce garbled characters for non-ASCII content - convert to UTF-8 before uploading.

Deduplication performs a strict string comparison after trimming leading and trailing whitespace. The comparison is case-sensitive: "Apple" and "apple" are treated as distinct values. Enable the "Trim whitespace" option to normalize spacing before comparison.