CSV Trimmer
Trim whitespace, remove empty rows and columns, strip duplicates from CSV files. Clean messy CSV data instantly in your browser.
About
Malformed CSV files silently corrupt data pipelines. A single trailing space in a join key causes failed lookups. An invisible empty row triggers off-by-one errors in row counters. An empty column inflates storage and breaks fixed-schema importers. This tool performs deterministic, cell-level trimming on RFC 4180-compliant CSV data. It strips leading and trailing whitespace from every cell value, removes structurally empty rows (where all fields are blank after trimming), eliminates fully empty columns, and optionally deduplicates rows by content hash. The parser handles quoted fields containing commas, embedded newlines (CRLF within double quotes), and escaped quote characters ("" sequences) per the RFC specification. It does not guess or infer - it tokenizes character by character. Note: this tool assumes UTF-8 encoding. Files with BOM markers are handled, but mixed encodings (e.g., Latin-1 fields inside a UTF-8 file) may produce garbled output for non-ASCII characters.
Formulas
The trimming pipeline applies operations in a deterministic order to avoid interaction effects between steps:
Where raw is the input text after BOM removal and line-ending normalization. Parse tokenizes per RFC 4180. TrimCells applies the regex /^\s+|\s+$/g to each unquoted cell value. RemoveEmptyRows filters rows where every cell satisfies cell = "". RemoveEmptyCols identifies column indices j where n∀i=0 celli,j = "", and removes them. Deduplicate hashes each row as a joined string and retains only the first occurrence.
Row reduction ratio: Roriginal − RtrimmedRoriginal × 100%
Reference Data
| Trim Operation | Description | Risk if Skipped | RFC 4180 Safe |
|---|---|---|---|
| Cell Whitespace Trim | Removes leading/trailing spaces, tabs from each cell | Join key mismatches, sort errors | Yes |
| Empty Row Removal | Deletes rows where all cells are blank after trim | Off-by-one row count errors | Yes |
| Empty Column Removal | Deletes columns where all cells (incl. header) are blank | Schema inflation, wasted storage | Yes |
| Duplicate Row Removal | Removes rows with identical content (keeps first occurrence) | Double-counted records, inflated aggregates | Yes |
| Trailing Delimiter Strip | Removes trailing commas producing phantom empty columns | Extra NULL columns in parsers | Yes |
| BOM Removal | Strips UTF-8 BOM (0xEF 0xBB 0xBF) from file start | First header field unreadable | N/A |
| Consistent Line Endings | Normalizes CR, LF, CRLF to CRLF | Parsers split or merge rows incorrectly | Yes (CRLF required) |
| Quote Normalization | Ensures fields with delimiters/newlines are properly quoted | Downstream parsers break on unquoted commas | Yes |
| Header Trim | Trims header names independently of data rows | Column name lookup failures in code | Yes |
| Carriage Return in Cell | Preserves CRLF inside quoted fields during trim | Data loss if naively stripped | Yes |
| Tab-to-Space Collapse | Optionally replaces inner tabs with single space | Misaligned data in fixed-width consumers | N/A |
| Numeric Whitespace | Trims spaces around numbers ( 42 → 42) | Type casting failures (NaN) | Yes |