About

HTML tables encode structured data in a format optimized for rendering, not programmatic access. Extracting that data manually introduces transcription errors, especially when colspan and rowspan attributes create irregular cell grids. A 5×10 table with merged cells can produce 50 logical cells mapped to far fewer DOM nodes. This tool parses raw HTML table markup through the browser's native DOMParser, constructs a normalized grid matrix that expands all span attributes, and outputs clean JavaScript arrays or JSON objects. Header rows are auto-detected from <thead> or <th> elements. Duplicate header names receive numeric suffixes to guarantee unique object keys.

Limitations: nested tables are flattened to text content of the outermost cell. The parser processes textContent by default, stripping inner HTML. Malformed markup is handled by the browser's error-tolerant parser, but results may vary across edge cases. Pro tip: paste markup directly from browser DevTools (Elements panel) for the cleanest source. Tables copied from spreadsheet applications often carry inline styles that inflate input size but do not affect extraction.

Formulas

The core algorithm constructs a 2D grid matrix G of dimensions R × C, where R is the total logical row count and C is the maximum logical column count after span expansion.

G[r][c] = cell(r, c)

For each DOM cell at row r, the algorithm finds the first unoccupied column index c in G[r]. If the cell has colspan = cs and rowspan = rs, the value is written to all positions:

fill(G, r … r + rs − 1, c … c + cs − 1)

Where G = grid matrix, r = current row index, c = resolved column index, cs = colspan attribute value (default 1), rs = rowspan attribute value (default 1).

Header key deduplication uses a frequency map F. For each header string h:

{

key = h if F[h] = 0key = h + "_" + F[h] otherwise

Number coercion applies the test: if isNaN(v) = FALSE and v.trim() ≠ "", then v is cast to Number(v).

Reference Data

Feature	Description	Default
Header Detection	Auto-detects <thead> or first row of <th> elements	Auto
Colspan Handling	Expands merged columns into repeated values in the grid matrix	Enabled
Rowspan Handling	Propagates cell values downward across spanned rows	Enabled
Duplicate Key Resolution	Appends _1, _2, etc. to duplicate header names	Enabled
Output: Array of Objects	Each row becomes {key: value} using headers as keys	Selected
Output: Array of Arrays	Each row is a flat array of cell values, no keys	Optional
Output: Nested (Grouped)	First column becomes group key, remaining columns nested	Optional
Content Mode: Text	Extracts textContent only, strips all HTML tags	Selected
Content Mode: HTML	Preserves inner HTML of each cell as a string value	Optional
Indentation	JSON output indentation: 2 or 4 spaces, or tab	2 spaces
Number Coercion	Converts purely numeric strings to JavaScript numbers	Optional
Trim Whitespace	Removes leading/trailing whitespace from each cell	Enabled
Empty Cell Value	Fills empty cells with null, empty string, or custom value	null
Max Supported Rows	Browser memory-limited, tested up to 10,000 rows	-
Multiple Tables	If input contains multiple <table> elements, all are parsed sequentially	All tables

Frequently Asked Questions

The converter builds a 2D grid matrix and iterates over each DOM cell. When a cell has a colspan or rowspan greater than 1, the cell's value is written to all corresponding positions in the matrix. This means a cell with colspan="3" and rowspan="2" fills 6 grid positions with the same value. The result is a fully rectangular matrix regardless of how irregularly the original table was merged.

Duplicate header names break object key uniqueness. The converter maintains a frequency counter for each header string. The first occurrence uses the original name. Subsequent duplicates receive a numeric suffix: "Name", "Name_1", "Name_2". This guarantees every key in the output objects is unique and no data is silently overwritten.

Yes, enabling number coercion converts any string that passes the isNaN test to a JavaScript Number. The string "00742" becomes the number 742, losing the leading zeros. For datasets containing ZIP codes, phone numbers, or product IDs with leading zeros, disable the number coercion option to preserve the original string values.

Nested tables (a table inside a td element) are not recursively parsed as separate structures. The converter extracts either the textContent or innerHTML of the outer cell. In text mode, all nested table content is flattened into a single string. In HTML mode, the full inner markup including the nested table tags is preserved as a string value.

Yes. If no thead section and no th elements are detected, the converter treats all rows as data rows. In Array of Objects mode, it generates generic keys: "col_0", "col_1", "col_2", etc. You can also enable the "Use first row as header" option to force the first tr to be interpreted as the header row regardless of tag type.

The converter runs entirely in the browser using the native DOMParser. Testing confirms reliable performance up to approximately 10,000 rows with 20 columns. Beyond that, the JSON serialization step may cause brief UI freezes. The output textarea uses lazy rendering. For extremely large tables (50,000+ rows), consider splitting the HTML input.

The browser's DOMParser is error-tolerant and will attempt to construct a valid DOM tree from malformed input. Missing closing tags, improperly nested elements, or stray text nodes are handled according to the HTML5 parsing specification. The converter then traverses whatever table structure the parser produced. A warning toast is shown if zero table elements are found in the parsed result.