User Rating 0.0
Total Usage 0 times
Drag & drop .html file here or
Paste raw HTML or upload a file above
Is this tool helpful?

Your feedback helps us improve.

About

Extracting tabular data from HTML is error-prone. Tables use colspan and rowspan attributes that create merged cell regions. A naive row-by-row copy loses alignment: column j in row i may actually map to column j + k due to a preceding span. This tool parses the full merge grid, reconstructing a rectangular matrix of m × n cells before export. It generates RFC 4180-compliant CSV with proper quoting and a valid OOXML XLSX binary (ZIP archive of XML parts) - not an HTML file renamed to .xlsx.

Limitations: nested tables (a <table> inside a <td>) are flattened to text. Formatting (colors, fonts, borders) is not preserved in output - only raw cell text. The XLSX writer uses shared strings without compression, so files with > 50000 cells may produce larger-than-expected files. For production spreadsheets, validate column alignment against the original after export.

html table converter table to csv table to xlsx html to spreadsheet table export csv converter xlsx generator

Formulas

The core challenge is resolving merged cells. Given a table with R rows, a fill-grid algorithm constructs a rectangular matrix G of dimensions R × C, where C is the effective column count.

fillGrid(row, cell): cs = cell.colspan || 1, rs = cell.rowspan || 1for dr = 0 to rs 1, dc = 0 to cs 1: G[row + dr][col + dc] = value

For each cell in the source HTML, the algorithm skips to the first unoccupied column in the current row, then fills a rs × cs block. Only the top-left cell of a merged region receives the text value; remaining cells are set to empty strings.

CSV encoding per RFC 4180:

escapeCSV(field) = if field contains , or " or \n" + field.replace(", "") + "

XLSX structure follows the OOXML standard (ECMA-376). The minimal valid archive contains 7 XML parts packed into a ZIP container using the store method (no deflate compression). Cell references use the column-letter system: column index c maps to letters via base-26 conversion where 0 A, 25 Z, 26 AA.

colToLetter(c) = if c < 26: String.fromCharCode(65 + c), else: colToLetter(c26 1) + String.fromCharCode(65 + (c mod 26))

Where c = zero-based column index. The ZIP local file header uses signature 0x04034b50, with CRC-32 computed per ISO 3309 for each file entry.

Reference Data

FeatureCSV OutputXLSX Output
File FormatPlain text (RFC 4180)OOXML ZIP archive
Excel CompatibleYes (with BOM)Yes (native)
Google Sheets CompatibleYesYes
LibreOffice CompatibleYesYes
Unicode SupportUTF-8 with BOMUTF-8 XML
Multiple SheetsNo (single file per table)Single sheet per file
Colspan HandlingEmpty cells insertedEmpty cells inserted
Rowspan HandlingValue repeated / empty fillValue repeated / empty fill
Cell FormattingNone (text only)None (text only)
Max Rows (practical)Unlimited~100000 (browser memory)
Max Columns (XLSX spec)N/A16384 (XFD)
DelimiterComma (,)N/A (XML cells)
Quoting RuleDouble-quote if field contains , " or newlineN/A
File Extension.csv.xlsx
MIME Typetext/csvapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Nested TablesFlattened to textFlattened to text
Header Detection<thead> / <th> treated as regular cellsSame
Empty CellsEmpty field (,,)Omitted cell element (Excel reads as blank)
Line EndingsCRLF (per RFC 4180)N/A

Frequently Asked Questions

The converter builds a fill-grid: a 2D matrix sized to the effective row × column dimensions of the table. When a cell has colspan=3 and rowspan=2, the algorithm writes the cell text to the top-left position and fills the remaining 5 positions with empty strings. This preserves column alignment in the output. The original merged cell's text appears once; it is not duplicated across the spanned region.
Excel on Windows defaults to the system locale encoding (often Windows-1252) when opening CSV files. This converter prepends a UTF-8 Byte Order Mark (BOM: EF BB BF) to signal UTF-8 encoding to Excel. If characters still appear broken, use Excel's Data → From Text/CSV import wizard and explicitly select UTF-8 (65001) as the encoding. The XLSX format avoids this issue entirely since it uses XML with declared UTF-8 encoding.
Nested tables (a element inside a
) are flattened: the inner table's text content is extracted as plain text and concatenated into the parent cell's value. The structural rows and columns of the inner table are lost. If you need to preserve nested table structure, extract each table separately - the converter lists all tables found in the HTML and lets you select which one to export.
Practical limits depend on available browser memory. Tables up to approximately 100,000 cells (e.g., 1000 rows × 100 columns) convert reliably. Beyond that, XLSX generation may cause memory pressure because the ZIP archive is assembled in memory as an ArrayBuffer. CSV output is more lightweight and can handle larger datasets. If the browser tab crashes, reduce the table size or split it into multiple exports.
It is a genuine OOXML spreadsheet. The tool constructs a valid ZIP archive containing the required XML parts: [Content_Types].xml, workbook.xml, sheet1.xml, sharedStrings.xml, styles.xml, and relationship files. Excel, Google Sheets, and LibreOffice all open it natively. Unlike some exporters that wrap HTML in an .xls extension (which triggers compatibility warnings), this produces a standards-compliant .xlsx binary.
The parser first scans all rows to compute the effective column count by summing colspan values per row and taking the maximum. The fill-grid is then pre-allocated to this width. Rows with fewer cells than the maximum simply leave trailing grid positions empty, which become blank cells in the output. This matches how browsers render ragged tables.