About

CSV files lack visual structure. Pasting raw comma-separated data into a README or wiki page produces unreadable output. This converter parses CSV input using an RFC 4180-compliant finite state machine and generates GitHub Flavored Markdown (GFM) tables with configurable column alignment. It handles quoted fields containing embedded commas, literal newlines, and escaped double-quotes ("" → "). Delimiter detection is automatic: the parser performs frequency analysis of , \t ; and | across the first 5 lines to determine the separator without user input.

Incorrect CSV-to-table conversion commonly produces misaligned columns, dropped fields, or broken pipe characters. A field containing a literal | will fracture a Markdown table row if left unescaped. This tool escapes pipe characters within cell content and pads columns to produce consistent, human-readable output. The approximation assumes the first row contains headers. If your CSV lacks headers, the tool generates synthetic column labels (Col 1, Col 2, …). Maximum tested input: 10 MB or roughly 50,000 rows.

Formulas

The CSV parser operates as a finite state machine with three states. For each character c at position i in the input string of length n, the transition function is:

{

S → IN_QUOTED if c = " and state = FIELD_STARTS → FIELD_START if c = delim or c = \nS → IN_UNQUOTED otherwise

Column width for padded output is calculated per column j across all m rows:

w_j = max(len(cell_i,j) for i = 0 to m − 1)

The separator row for column j with alignment a is built by repeating the dash character - for w_j characters, then prepending or appending : based on a ∈ {LEFT, CENTER, RIGHT}.

Auto-detection scores each candidate delimiter d by computing the standard deviation of its occurrence count across sample lines. The delimiter with the lowest non-zero standard deviation wins, as consistent column counts produce uniform frequency.

score(d) = σ(count(d, line_k) for k = 0..4)

Where count(d, line) tallies occurrences of d outside quoted regions. The chosen delimiter is argmin(score) among candidates with mean count ≥ 1.

Reference Data

Delimiter	Symbol	Common Source	Auto-Detected	Notes
Comma	,	Excel, Google Sheets export	Yes	RFC 4180 default
Tab	\t	TSV files, database dumps	Yes	Common in scientific data
Semicolon	;	European locale Excel	Yes	Used when comma is decimal separator
Pipe	\|	Log files, custom exports	Yes	Must be escaped in Markdown output
Colon	:	/etc/passwd, config files	Manual	Set via custom delimiter option
Space		Fixed-width text files	Manual	Ambiguous; use with caution
Markdown Alignment Syntax
Left	:---	Default alignment for text columns
Center	:---:	Suitable for status codes, short labels
Right	---:	Recommended for numeric data columns
CSV Quoting Rules (RFC 4180)
Quoted field	"value"	Required when field contains delimiter, newline, or quotes
Escaped quote	""	Two consecutive double-quotes represent one literal quote
Empty field	,,	Produces empty cell in Markdown output
Trailing CRLF	\r\n	Stripped during parsing; does not create extra row
BOM marker	\uFEFF	UTF-8 BOM stripped automatically from first byte
GFM Table Limits (GitHub)
Max columns	250	GitHub renderer limit per table
Max cell length	500 chars	Longer content may be truncated in preview
Nested tables	No	Markdown does not support nested tables

Frequently Asked Questions

The parser implements RFC 4180 quoting rules. Any field wrapped in double quotes is treated as a single value regardless of embedded commas, newlines, or other delimiter characters. Embedded double quotes are represented as two consecutive quotes ("") in the source CSV and converted to a single quote in the Markdown output. If your CSV was exported from Excel or Google Sheets, quoting is applied automatically for problematic fields.

A literal pipe (|) inside a GFM table cell breaks the column structure. This converter automatically escapes every pipe character in cell content by replacing | with \|. This is the standard GFM escaping method and renders correctly on GitHub, GitLab, Bitbucket, and most Markdown processors.

Yes. Toggle the "First row is header" option off. The converter will generate synthetic headers labeled Col 1, Col 2, Col 3, etc., and treat every row as data. GFM tables require a header row, so synthetic headers are always inserted when the option is disabled.

Auto-detection analyzes the first 5 lines and selects the delimiter with the most consistent frequency (lowest standard deviation). It can fail when: (a) the file has fewer than 2 lines, (b) multiple delimiters appear with equal frequency, or (c) the file uses an uncommon delimiter like colon or space. In these cases, manually select the correct delimiter from the dropdown or enter a custom character.

The tool enforces a 10 MB limit on file uploads to prevent browser memory issues. For text pasted directly, practical limits depend on your browser but typically cap around 5 MB of text. Files exceeding 1,000 rows are processed in batches to keep the UI responsive. For very large datasets (50,000+ rows), expect a few seconds of processing time.

Yes. GitHub's Markdown renderer respects the colon syntax in the separator row. Left alignment (:---) is the default and typically unnecessary to specify. Right alignment (---:) is recommended for numeric columns like prices or counts. Center alignment (:---:) works for short labels or status indicators. The alignment only affects rendering; it does not alter the cell content.