User Rating 0.0
Total Usage 0 times
JavaScript Array Output
Is this tool helpful?

Your feedback helps us improve.

About

HTML tables encode structured data in a format optimized for rendering, not programmatic access. Extracting that data manually introduces transcription errors, especially when colspan and rowspan attributes create irregular cell grids. A 5×10 table with merged cells can produce 50 logical cells mapped to far fewer DOM nodes. This tool parses raw HTML table markup through the browser's native DOMParser, constructs a normalized grid matrix that expands all span attributes, and outputs clean JavaScript arrays or JSON objects. Header rows are auto-detected from <thead> or <th> elements. Duplicate header names receive numeric suffixes to guarantee unique object keys.

Limitations: nested tables are flattened to text content of the outermost cell. The parser processes textContent by default, stripping inner HTML. Malformed markup is handled by the browser's error-tolerant parser, but results may vary across edge cases. Pro tip: paste markup directly from browser DevTools (Elements panel) for the cleanest source. Tables copied from spreadsheet applications often carry inline styles that inflate input size but do not affect extraction.

html table converter table to json table to array html parser javascript array json converter table extraction

Formulas

The core algorithm constructs a 2D grid matrix G of dimensions R × C, where R is the total logical row count and C is the maximum logical column count after span expansion.

G[r][c] = cell(r, c)

For each DOM cell at row r, the algorithm finds the first unoccupied column index c in G[r]. If the cell has colspan = cs and rowspan = rs, the value is written to all positions:

fill(G, r r + rs 1, c c + cs 1)

Where G = grid matrix, r = current row index, c = resolved column index, cs = colspan attribute value (default 1), rs = rowspan attribute value (default 1).

Header key deduplication uses a frequency map F. For each header string h:

{
key = h if F[h] = 0key = h + "_" + F[h] otherwise

Number coercion applies the test: if isNaN(v) = FALSE and v.trim() "", then v is cast to Number(v).

Reference Data

FeatureDescriptionDefault
Header DetectionAuto-detects <thead> or first row of <th> elementsAuto
Colspan HandlingExpands merged columns into repeated values in the grid matrixEnabled
Rowspan HandlingPropagates cell values downward across spanned rowsEnabled
Duplicate Key ResolutionAppends _1, _2, etc. to duplicate header namesEnabled
Output: Array of ObjectsEach row becomes {key: value} using headers as keysSelected
Output: Array of ArraysEach row is a flat array of cell values, no keysOptional
Output: Nested (Grouped)First column becomes group key, remaining columns nestedOptional
Content Mode: TextExtracts textContent only, strips all HTML tagsSelected
Content Mode: HTMLPreserves inner HTML of each cell as a string valueOptional
IndentationJSON output indentation: 2 or 4 spaces, or tab2 spaces
Number CoercionConverts purely numeric strings to JavaScript numbersOptional
Trim WhitespaceRemoves leading/trailing whitespace from each cellEnabled
Empty Cell ValueFills empty cells with null, empty string, or custom valuenull
Max Supported RowsBrowser memory-limited, tested up to 10,000 rows -
Multiple TablesIf input contains multiple <table> elements, all are parsed sequentiallyAll tables

Frequently Asked Questions

The converter builds a 2D grid matrix and iterates over each DOM cell. When a cell has a colspan or rowspan greater than 1, the cell's value is written to all corresponding positions in the matrix. This means a cell with colspan="3" and rowspan="2" fills 6 grid positions with the same value. The result is a fully rectangular matrix regardless of how irregularly the original table was merged.
Duplicate header names break object key uniqueness. The converter maintains a frequency counter for each header string. The first occurrence uses the original name. Subsequent duplicates receive a numeric suffix: "Name", "Name_1", "Name_2". This guarantees every key in the output objects is unique and no data is silently overwritten.
Yes, enabling number coercion converts any string that passes the isNaN test to a JavaScript Number. The string "00742" becomes the number 742, losing the leading zeros. For datasets containing ZIP codes, phone numbers, or product IDs with leading zeros, disable the number coercion option to preserve the original string values.
Nested tables (a table inside a td element) are not recursively parsed as separate structures. The converter extracts either the textContent or innerHTML of the outer cell. In text mode, all nested table content is flattened into a single string. In HTML mode, the full inner markup including the nested table tags is preserved as a string value.
Yes. If no thead section and no th elements are detected, the converter treats all rows as data rows. In Array of Objects mode, it generates generic keys: "col_0", "col_1", "col_2", etc. You can also enable the "Use first row as header" option to force the first tr to be interpreted as the header row regardless of tag type.
The converter runs entirely in the browser using the native DOMParser. Testing confirms reliable performance up to approximately 10,000 rows with 20 columns. Beyond that, the JSON serialization step may cause brief UI freezes. The output textarea uses lazy rendering. For extremely large tables (50,000+ rows), consider splitting the HTML input.
The browser's DOMParser is error-tolerant and will attempt to construct a valid DOM tree from malformed input. Missing closing tags, improperly nested elements, or stray text nodes are handled according to the HTML5 parsing specification. The converter then traverses whatever table structure the parser produced. A warning toast is shown if zero table elements are found in the parsed result.