Compress CSV File
Compress CSV files online for free. Reduce file size with GZIP, whitespace removal, quote optimization & dictionary encoding. 100% client-side.
Drop your CSV file here or click to browse
Supports .csv and .tsv files up to 500 MB
About
CSV files grow deceptively large. A 50MB export from a database can contain 40% redundant whitespace, unnecessary RFC 4180 quoting, and thousands of repeated cell values. Uploading, transferring, or archiving such files wastes bandwidth and storage. This tool applies real compression: whitespace stripping, quote optimization per RFC 4180 rules, empty row/column removal, and standards-compliant GZIP via the browser's native CompressionStream API. For tabular data with high value repetition, dictionary encoding alone can reduce size by 30 - 60% before binary compression. All processing runs in your browser. No data is uploaded to any server.
Limitations: dictionary encoding assumes repetition exists. Files with entirely unique cell values see minimal gain from that method. GZIP output (.csv.gz) requires decompression before use in spreadsheet software. The tool approximates compression ratios; actual ratios depend on data entropy. For files exceeding 500MB, consider command-line tools due to browser memory constraints.
Formulas
Compression ratio quantifies the effectiveness of size reduction. The ratio R is defined as:
Where Soriginal = original file size in bytes, Scompressed = compressed file size in bytes. A ratio of 80% means the file is 5ร smaller.
GZIP uses the DEFLATE algorithm, which combines LZ77 (sliding window dictionary matching) with Huffman coding. The theoretical entropy limit for compression of a byte stream is given by Shannon entropy:
Where pi is the probability of the i-th symbol. Data with low entropy (many repeated values) compresses better. CSV files with categorical columns (e.g., country codes, status flags) typically exhibit low entropy per column, making them excellent candidates for both dictionary encoding and DEFLATE.
Quote optimization follows RFC 4180: a field requires quoting only if it contains the delimiter (,), a double-quote ("), or a newline (CRLF). All other quotes are redundant overhead.
Reference Data
| Compression Method | Type | Typical Reduction | Output Format | Reversible | Best For |
|---|---|---|---|---|---|
| Whitespace Removal | Lossless | 5 - 15% | .csv | N/A (cleaned) | Padded exports |
| Quote Optimization | Lossless | 3 - 10% | .csv | N/A (cleaned) | Over-quoted CSVs |
| Empty Row Removal | Lossless | 1 - 20% | .csv | N/A (cleaned) | Sparse datasets |
| Line Ending Normalization | Lossless | 0 - 2% | .csv | N/A | Cross-platform files |
| BOM Removal | Lossless | 3 bytes | .csv | N/A | UTF-8 with BOM |
| GZIP (DEFLATE) | Lossless | 60 - 90% | .csv.gz | Yes | Archival, transfer |
| Dictionary Encoding | Lossless | 30 - 60% | .csv | Yes (with header) | High-repetition data |
| Column Removal | Lossy | Varies | .csv | No | Unneeded columns |
| GZIP Level 1 (Fast) | Lossless | 50 - 75% | .csv.gz | Yes | Speed priority |
| GZIP Level 9 (Max) | Lossless | 65 - 92% | .csv.gz | Yes | Size priority |
| Combined (Clean + GZIP) | Lossless | 70 - 95% | .csv.gz | Partial | Maximum compression |
| TSV โ CSV (tab removal) | Lossless | 0 - 5% | .csv | N/A | Tab-delimited input |