User Rating 0.0 ★★★★★

Total Usage 0 times

Category HTML/XML Utilities

Comments

Whitespace

Doctype

Lowercase Tags

Indent:

Mode:

HTML Input

JSON Output

JSON output will appear here...

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

HTML documents form a tree of nodes. Each element carries a tag name, a map of attributes, and zero or more children. Manually reconstructing this hierarchy into a data-interchange format like JSON is error-prone: missed closing tags, misread nesting depth, or dropped attribute values. This converter uses the browser's native DOMParser API to build a spec-compliant parse tree, then performs a depth-first recursive walk to emit a clean JSON object. The output preserves node type (ELEMENT, TEXT, COMMENT), all attribute key-value pairs, and the full child hierarchy. Note: the parser follows HTML5 error-recovery rules, so malformed markup will be silently corrected rather than rejected.

Typical use cases include automated testing fixtures, CMS migration scripts, and accessibility audits where you need a machine-readable snapshot of a page fragment. The tool handles documents up to roughly 500 KB of markup in the main thread; larger inputs are offloaded to a Web Worker. Pro tip: if your source HTML contains entities or inline style attributes, those appear verbatim in the JSON output. Filter them downstream if your pipeline requires clean data.

Formulas

The conversion algorithm performs a recursive depth-first traversal. For each node N in the DOM tree, the mapping function f produces a JSON object:

f(N) = { tag: N.tagName, attributes: mapAttrs(N), children: [f(c₀), f(c₁), …, f(c_n)] }

Where c₀ … c_n are the child nodes of N. The attribute mapping function iterates the NamedNodeMap:

mapAttrs(N) = { a_i.name → a_i.value | i ∈ [0, N.attributes.length) }

Text node handling applies a whitespace filter predicate P:

P(t) = t.trim().length > 0

Where t = the text content of the node. Only nodes satisfying P are included unless the "include whitespace" option is enabled. The total node count in the output is bounded by the recursive relation T(N) = 1 + k∑i=0 T(c_i), where k is the number of children. Time complexity is O(n) where n is total node count.

Reference Data

Node Type	nodeType Value	JSON Representation	Included by Default
Element	1	`{ tag, attributes, children }`	Yes
Text	3	`{ type: "text", content }`	Yes (non-empty)
Comment	8	`{ type: "comment", content }`	Optional
CDATA Section	4	`{ type: "cdata", content }`	Optional
Document	9	Root wrapper	Skipped (children used)
DocumentType	10	`{ type: "doctype", name }`	Optional
DocumentFragment	11	Root wrapper	Skipped (children used)
Attribute	2	Merged into parent `attributes`	Always
Processing Instruction	7	`{ type: "pi", target, data }`	Optional
Entity Reference	5	Resolved to text	Automatic
Void Elements (br, img, hr, input)	1	`{ tag, attributes, children: [] }`	Yes
SVG Elements	1	Namespace preserved in tag	Yes
Custom Elements (web components)	1	Hyphenated tag preserved	Yes
Template Content	11	Fragment children extracted	Optional
Whitespace-only Text	3	Filtered out	No (configurable)

Frequently Asked Questions

The browser's native DOMParser follows HTML5 error-recovery rules defined in the WHATWG spec. Unclosed tags are auto-closed, misnested elements are reparented, and unknown tags are treated as valid custom elements. The resulting JSON reflects the corrected tree, not the original source. If you need to detect parse errors, check the output for a element which some browsers inject into malformed XML (but not HTML) mode.

By default, whitespace-only text nodes (containing only spaces, tabs, or newlines) are filtered out. These nodes exist between elements due to HTML source formatting but carry no semantic meaning. Enable the "Include Whitespace Nodes" option to preserve them. Note that entities are non-breaking spaces and will pass the whitespace filter since they are not standard whitespace characters.

When parsing in HTML mode, SVG and MathML elements are recognized and their tag names are preserved in their original case (e.g., viewBox, clipPath). However, explicit namespace URIs are not included in the JSON output by default. The tag name alone is sufficient for most use cases. If you need full namespace data, switch to XML parse mode in the options.

Inputs under 50 KB are parsed synchronously in the main thread with no perceptible delay. For inputs between 50 KB and 500 KB, parsing is offloaded to a Web Worker to prevent UI freezing. Beyond 500 KB, memory pressure from the JSON stringification may cause issues depending on browser and device. For extremely large documents, consider splitting the HTML into fragments before conversion.

Inline event handlers like onclick or onload appear as regular string attributes in the JSON. Script element content appears as a text child node. No code is executed during parsing because DOMParser creates an inert document context. This makes the tool safe for analyzing untrusted HTML. The output is purely structural data with no executable side effects.

Yes. The JSON structure is bidirectional by design. Each element object contains its tag name, full attribute map, and ordered children array. A simple recursive function that creates elements with document.createElement, sets attributes, appends children, and handles text nodes can reconstruct the DOM. However, some information is lost: original whitespace formatting, attribute quote style, self-closing tag syntax, and comment positions may differ from the source.