About

CSV files lack visual structure. Opening a 10,000-row dataset in a text editor produces an unreadable wall of comma-separated values. Sharing raw CSV with stakeholders risks misinterpretation: shifted columns, encoding errors, or delimiter confusion corrupt the data silently. This converter parses your CSV using an RFC 4180-compliant state machine, handles quoted fields containing commas and newlines, auto-detects delimiters (comma, semicolon, tab), and generates a valid PDF 1.4 binary entirely in your browser. No data leaves your machine.

The PDF renderer calculates column widths proportionally based on content length, applies page breaks when rows exceed available vertical space, and embeds Helvetica as a base font (no subsetting required per PDF spec). Limitation: Unicode characters outside the Latin-1 (ISO 8859-1) range render as placeholder glyphs. For CJK or Arabic datasets, pre-filter your data. Maximum tested throughput is approximately 50,000 rows at 8 columns before browser memory constraints apply.

Formulas

The converter calculates proportional column widths to fit the available page area. For each column j, the maximum content length L_j across all rows is measured in characters. The allocated width W_j in PDF points is:

W_j = L_jn∑k=1 L_k × W_page

Where W_page = PageWidth − 2 × Margin is the usable content width in points. Each column is clamped to a minimum of 30 pt to prevent zero-width columns.

Rows per page R determines page-break positions:

R = floor( H_page − 2 × Margin − H_headerH_row )

Where H_row = fontSize × 1.8 accounts for line height plus cell padding. The header row height H_header uses a 1.2× multiplier for visual weight.

CSV delimiter auto-detection counts occurrences of each candidate delimiter (, ; \t | :) in the first 5 lines. The delimiter with the highest consistent frequency (lowest standard deviation across lines) wins.

Reference Data

Delimiter	Symbol	Common Use	Auto-Detected
Comma	,	Standard CSV (RFC 4180)	Yes
Semicolon	;	European locale CSV (Excel EU)	Yes
Tab	\t	TSV files, database exports	Yes
Pipe	\|	Unix log files, legacy systems	Yes
Colon	:	/etc/passwd, config files	Yes

PDF Page Size	Width mm	Height mm	Width pt	Height pt
A4 Portrait	210	297	595.28	841.89
A4 Landscape	297	210	841.89	595.28
Letter Portrait	215.9	279.4	612	792
Letter Landscape	279.4	215.9	792	612
Legal Portrait	215.9	355.6	612	1008
Legal Landscape	355.6	215.9	1008	612

Font Size pt	Approx. Chars per A4 Width	Rows per A4 Page	Best For
6	~160	~95	Dense data, many columns
7	~135	~82	Compact reports
8	~120	~72	Standard readability
9	~105	~64	Comfortable reading
10	~95	~57	Presentations
11	~85	~52	Large print, few columns
12	~78	~48	Title rows, emphasis

Frequently Asked Questions

The parser implements the RFC 4180 specification as a finite state machine. Fields wrapped in double quotes are treated as literal strings - commas, newlines (both CRLF and LF), and other delimiters inside quotes are preserved as field content. Escaped double-quotes (two consecutive "") are collapsed to a single quote character. This means a field like "New York, NY" remains intact as one column value rather than splitting into two columns.

The converter uses the first row (or header row if enabled) as the canonical column count. Rows with fewer fields are padded with empty strings to match. Rows with excess fields have trailing fields truncated. A warning toast is displayed noting the inconsistency and affected row numbers. This prevents column-shift errors in the generated PDF table.

The PDF is generated using Helvetica, one of the 14 standard PDF base fonts defined in the PDF 1.0 specification. These fonts only support the Latin-1 (ISO 8859-1) character set - 256 code points covering Western European languages. Characters outside this range (CJK, Cyrillic, Arabic, emoji) cannot be rendered without font embedding and subsetting, which requires a font parsing library. For such datasets, pre-convert your CSV to Latin-1 compatible text or use a desktop tool that supports font embedding.

The converter processes files up to 50 MB. For files exceeding 500 KB, parsing is offloaded to a Web Worker to prevent UI freezing. Practical row limits depend on available browser memory: approximately 50,000 rows with 8 columns is a tested ceiling on devices with 4 GB RAM. Each PDF page holds between 48 and 95 data rows depending on font size (12 pt to 6 pt). A 50,000-row dataset at 8 pt font produces roughly 700 pages.

The algorithm samples the first 5 lines of the file and counts occurrences of 5 candidate delimiters: comma, semicolon, tab, pipe, and colon. It selects the delimiter with the most consistent count per line (lowest coefficient of variation). It can fail on single-column CSVs (no delimiters at all), files where the actual delimiter appears inside quoted fields more often than as a separator, or files shorter than 2 lines. In ambiguous cases, comma is the default fallback. You can always override by selecting the delimiter manually.

No. The entire process runs in your browser using the File API for reading, a JavaScript state-machine parser for CSV, and a raw PDF 1.4 binary generator that constructs the file byte-by-byte. No network requests are made. Your data never leaves your device. The generated PDF is created as an in-memory Blob and offered for download via a temporary object URL that is revoked after use.

Yes. The settings panel allows you to toggle header row highlighting, which applies a colored background fill (configurable) and bold text to the first row. You can also set font size (6-12 pt), page orientation (portrait or landscape), page size (A4, Letter, Legal), and margin width. These settings persist in localStorage across sessions.