HTML Table to JavaScript Array Converter
Convert HTML tables to JavaScript arrays or JSON objects instantly. Handles colspan, rowspan, thead detection, and exports clean structured data.
About
HTML tables encode structured data in a format optimized for rendering, not programmatic access. Extracting that data manually introduces transcription errors, especially when colspan and rowspan attributes create irregular cell grids. A 5×10 table with merged cells can produce 50 logical cells mapped to far fewer DOM nodes. This tool parses raw HTML table markup through the browser's native DOMParser, constructs a normalized grid matrix that expands all span attributes, and outputs clean JavaScript arrays or JSON objects. Header rows are auto-detected from <thead> or <th> elements. Duplicate header names receive numeric suffixes to guarantee unique object keys.
Limitations: nested tables are flattened to text content of the outermost cell. The parser processes textContent by default, stripping inner HTML. Malformed markup is handled by the browser's error-tolerant parser, but results may vary across edge cases. Pro tip: paste markup directly from browser DevTools (Elements panel) for the cleanest source. Tables copied from spreadsheet applications often carry inline styles that inflate input size but do not affect extraction.
Formulas
The core algorithm constructs a 2D grid matrix G of dimensions R × C, where R is the total logical row count and C is the maximum logical column count after span expansion.
For each DOM cell at row r, the algorithm finds the first unoccupied column index c in G[r]. If the cell has colspan = cs and rowspan = rs, the value is written to all positions:
Where G = grid matrix, r = current row index, c = resolved column index, cs = colspan attribute value (default 1), rs = rowspan attribute value (default 1).
Header key deduplication uses a frequency map F. For each header string h:
Number coercion applies the test: if isNaN(v) = FALSE and v.trim() ≠ "", then v is cast to Number(v).
Reference Data
| Feature | Description | Default |
|---|---|---|
| Header Detection | Auto-detects <thead> or first row of <th> elements | Auto |
| Colspan Handling | Expands merged columns into repeated values in the grid matrix | Enabled |
| Rowspan Handling | Propagates cell values downward across spanned rows | Enabled |
| Duplicate Key Resolution | Appends _1, _2, etc. to duplicate header names | Enabled |
| Output: Array of Objects | Each row becomes {key: value} using headers as keys | Selected |
| Output: Array of Arrays | Each row is a flat array of cell values, no keys | Optional |
| Output: Nested (Grouped) | First column becomes group key, remaining columns nested | Optional |
| Content Mode: Text | Extracts textContent only, strips all HTML tags | Selected |
| Content Mode: HTML | Preserves inner HTML of each cell as a string value | Optional |
| Indentation | JSON output indentation: 2 or 4 spaces, or tab | 2 spaces |
| Number Coercion | Converts purely numeric strings to JavaScript numbers | Optional |
| Trim Whitespace | Removes leading/trailing whitespace from each cell | Enabled |
| Empty Cell Value | Fills empty cells with null, empty string, or custom value | null |
| Max Supported Rows | Browser memory-limited, tested up to 10,000 rows | - |
| Multiple Tables | If input contains multiple <table> elements, all are parsed sequentially | All tables |