About

Incorrect delimiter detection or unescaped quote handling during CSV-to-matrix conversion silently corrupts data. A single misaligned column shifts every downstream calculation. This tool implements a full RFC 4180-compliant state-machine parser that correctly resolves quoted fields containing commas, newlines, and escaped double-quotes. It auto-detects delimiters among comma, semicolon, tab, and pipe by frequency analysis of the first 10 rows. Output dimensions are validated as m × n rectangular matrices before export. Non-rectangular inputs are flagged and padded or truncated per user preference.

The converter exports to 8 target formats: JSON 2D array, Python NumPy, MATLAB, LaTeX bmatrix, Markdown table, C/C++ array initializer, R matrix(), and Wolfram Mathematica. Note: this tool treats all cell values as-is. Numeric validation is optional. Floating-point values using comma decimals (European notation) require semicolon or tab delimiters to avoid ambiguity.

Formulas

The delimiter auto-detection algorithm scores each candidate delimiter by computing its frequency consistency across the first k rows (default k = 10):

score_d = 11 + σ_d ⋅ n_d

Where d is the candidate delimiter, σ_d is the standard deviation of delimiter count per row, and n_d is the mean count. A perfect CSV has σ = 0, yielding maximum score. The delimiter with the highest score wins.

Matrix dimensions are reported as m × n where m is the row count and n is the column count. For rectangular validation, the tool checks that len(row_i) = n for all i ∈ [0, m). Padding mode fills short rows with empty strings. Truncate mode clips to min column count.

The CSV parser uses a finite-state machine with 3 states: FIELD_START, UNQUOTED, and QUOTED. Transitions occur on delimiter character, quote character, or newline. The QUOTED state handles escaped quotes ("") by checking the next character before transitioning.

Reference Data

Output Format	Language / System	Syntax Pattern	Numeric Only	Supports Strings	Max Practical Size
JSON 2D Array	JavaScript / Universal	`[[1,2],[3,4]]`	No	Yes	~50 MB
NumPy	Python	`np.array([[1,2],[3,4]])`	Recommended	dtype=object	~100k×100k
MATLAB	MATLAB / Octave	`[1 2; 3 4]`	Yes	No (cell array)	~10k×10k
LaTeX bmatrix	LaTeX / TeX	`\begin{bmatrix}...\end{bmatrix}`	Recommended	Yes	~50×50 (display)
Markdown Table	Markdown / GitHub	`\| a \| b \|`	No	Yes	~1000 rows
C/C++ Array	C / C++	`int m[2][2] = {{1,2},{3,4}};`	Yes	No	Stack: ~1k×1k
R matrix()	R	`matrix(c(1,2,3,4), nrow=2, byrow=TRUE)`	Recommended	Yes	~50k×50k
Wolfram	Mathematica	`{{1,2},{3,4}}`	No	Yes	~10k×10k
CSV (Comma)	Universal	`1,2\n3,4`	No	Yes	~500 MB
TSV (Tab)	Spreadsheet	`1\t2\n3\t4`	No	Yes	~500 MB
Delimiter Frequency	Auto-detection ranks: comma > semicolon > tab > pipe by occurrence in first 10 rows
RFC 4180 Rule 1	Each record is on a separate line, delimited by a line break (CRLF)
RFC 4180 Rule 2	Last record may or may not have an ending line break
RFC 4180 Rule 3	First record may be a header (optional flag in this tool)
RFC 4180 Rule 4	Fields may be enclosed in double quotes; fields containing delimiters must be quoted
RFC 4180 Rule 5	Double quotes inside quoted fields are escaped by doubling: `""`

Frequently Asked Questions

The auto-detection algorithm computes frequency consistency across rows. In European-style CSVs using commas as decimal separators, the comma count per row varies erratically because numeric fields contain irregular decimal positions. A semicolon or tab delimiter will show far more consistent counts across rows, giving it a higher score. If you know your file uses European decimals, manually select semicolon or tab as the delimiter to avoid ambiguity.

The tool detects non-rectangular input and offers two repair strategies. Pad mode appends empty strings to short rows until every row matches the longest row length. Truncate mode clips all rows to match the shortest row length. The dimension indicator updates to show the corrected m × n size. Both modes preserve original data order. No data is silently discarded without notification.

LaTeX bmatrix is intended for display-quality typesetting and becomes impractical beyond approximately 50 × 50. The tool will generate valid LaTeX for any size, but TeX compilers may fail or produce illegible output for large matrices. For matrices exceeding 20 columns, consider using the MATLAB or NumPy format for computation, then selectively typesetting submatrices in LaTeX.

Yes. By default all cell values are treated as strings. The NumPy output uses dtype=object when non-numeric cells are detected. JSON wraps strings in quotes and leaves numbers unquoted. MATLAB output switches to cell array notation {'text', 1; "more", 2} when strings are present. C/C++ output requires numeric data and will flag an error if text cells are found. The format-specific behavior is automatic based on cell content analysis.

The RFC 4180 state-machine parser enters QUOTED state upon encountering an opening double-quote. While in QUOTED state, newline characters (both LF and CRLF) are treated as literal field content, not as row delimiters. The parser only exits QUOTED state when it encounters a closing double-quote followed by a delimiter or end-of-line. This correctly preserves multi-line cell values such as addresses or descriptions.

The tool runs entirely in the browser using JavaScript string processing. Practical limits depend on available RAM. Files under 10 MB parse near-instantly. Files between 10-50 MB use chunked processing to avoid UI freezing, with a progress indicator. Files exceeding 50 MB may cause browser memory warnings on devices with less than 4 GB RAM. For very large datasets, consider splitting the CSV or using a server-side tool.