HTML Table to CSV/XLSX Converter
Convert HTML tables to CSV or XLSX spreadsheet files. Paste HTML or upload a file, preview tables, handle colspan/rowspan, and download instantly.
About
Extracting tabular data from HTML is error-prone. Tables use colspan and rowspan attributes that create merged cell regions. A naive row-by-row copy loses alignment: column j in row i may actually map to column j + k due to a preceding span. This tool parses the full merge grid, reconstructing a rectangular matrix of m × n cells before export. It generates RFC 4180-compliant CSV with proper quoting and a valid OOXML XLSX binary (ZIP archive of XML parts) - not an HTML file renamed to .xlsx.
Limitations: nested tables (a <table> inside a <td>) are flattened to text. Formatting (colors, fonts, borders) is not preserved in output - only raw cell text. The XLSX writer uses shared strings without compression, so files with > 50000 cells may produce larger-than-expected files. For production spreadsheets, validate column alignment against the original after export.
Formulas
The core challenge is resolving merged cells. Given a table with R rows, a fill-grid algorithm constructs a rectangular matrix G of dimensions R × C, where C is the effective column count.
For each cell in the source HTML, the algorithm skips to the first unoccupied column in the current row, then fills a rs × cs block. Only the top-left cell of a merged region receives the text value; remaining cells are set to empty strings.
CSV encoding per RFC 4180:
XLSX structure follows the OOXML standard (ECMA-376). The minimal valid archive contains 7 XML parts packed into a ZIP container using the store method (no deflate compression). Cell references use the column-letter system: column index c maps to letters via base-26 conversion where 0 → A, 25 → Z, 26 → AA.
Where c = zero-based column index. The ZIP local file header uses signature 0x04034b50, with CRC-32 computed per ISO 3309 for each file entry.
Reference Data
| Feature | CSV Output | XLSX Output |
|---|---|---|
| File Format | Plain text (RFC 4180) | OOXML ZIP archive |
| Excel Compatible | Yes (with BOM) | Yes (native) |
| Google Sheets Compatible | Yes | Yes |
| LibreOffice Compatible | Yes | Yes |
| Unicode Support | UTF-8 with BOM | UTF-8 XML |
| Multiple Sheets | No (single file per table) | Single sheet per file |
| Colspan Handling | Empty cells inserted | Empty cells inserted |
| Rowspan Handling | Value repeated / empty fill | Value repeated / empty fill |
| Cell Formatting | None (text only) | None (text only) |
| Max Rows (practical) | Unlimited | ~100000 (browser memory) |
| Max Columns (XLSX spec) | N/A | 16384 (XFD) |
| Delimiter | Comma (,) | N/A (XML cells) |
| Quoting Rule | Double-quote if field contains , " or newline | N/A |
| File Extension | .csv | .xlsx |
| MIME Type | text/csv | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
| Nested Tables | Flattened to text | Flattened to text |
| Header Detection | <thead> / <th> treated as regular cells | Same |
| Empty Cells | Empty field (,,) | Omitted cell element (Excel reads as blank) |
| Line Endings | CRLF (per RFC 4180) | N/A |
Frequently Asked Questions
| ) are flattened: the inner table's text content is extracted as plain text and concatenated into the parent cell's value. The structural rows and columns of the inner table are lost. If you need to preserve nested table structure, extract each table separately - the converter lists all tables found in the HTML and lets you select which one to export. Practical limits depend on available browser memory. Tables up to approximately 100,000 cells (e.g., 1000 rows × 100 columns) convert reliably. Beyond that, XLSX generation may cause memory pressure because the ZIP archive is assembled in memory as an ArrayBuffer. CSV output is more lightweight and can handle larger datasets. If the browser tab crashes, reduce the table size or split it into multiple exports. It is a genuine OOXML spreadsheet. The tool constructs a valid ZIP archive containing the required XML parts: [Content_Types].xml, workbook.xml, sheet1.xml, sharedStrings.xml, styles.xml, and relationship files. Excel, Google Sheets, and LibreOffice all open it natively. Unlike some exporters that wrap HTML in an .xls extension (which triggers compatibility warnings), this produces a standards-compliant .xlsx binary. The parser first scans all rows to compute the effective column count by summing colspan values per row and taking the maximum. The fill-grid is then pre-allocated to this width. Rows with fewer cells than the maximum simply leave trailing grid positions empty, which become blank cells in the output. This matches how browsers render ragged tables. |