User Rating 0.0
Total Usage 0 times
HTML Input
JSON Output
JSON output will appear here...
Is this tool helpful?

Your feedback helps us improve.

About

HTML documents form a tree of nodes. Each element carries a tag name, a map of attributes, and zero or more children. Manually reconstructing this hierarchy into a data-interchange format like JSON is error-prone: missed closing tags, misread nesting depth, or dropped attribute values. This converter uses the browser's native DOMParser API to build a spec-compliant parse tree, then performs a depth-first recursive walk to emit a clean JSON object. The output preserves node type (ELEMENT, TEXT, COMMENT), all attribute key-value pairs, and the full child hierarchy. Note: the parser follows HTML5 error-recovery rules, so malformed markup will be silently corrected rather than rejected.

Typical use cases include automated testing fixtures, CMS migration scripts, and accessibility audits where you need a machine-readable snapshot of a page fragment. The tool handles documents up to roughly 500 KB of markup in the main thread; larger inputs are offloaded to a Web Worker. Pro tip: if your source HTML contains   entities or inline style attributes, those appear verbatim in the JSON output. Filter them downstream if your pipeline requires clean data.

dom to json html to json dom parser html parser dom tree converter html structure json converter

Formulas

The conversion algorithm performs a recursive depth-first traversal. For each node N in the DOM tree, the mapping function f produces a JSON object:

f(N) = { tag: N.tagName, attributes: mapAttrs(N), children: [f(c0), f(c1), …, f(cn)] }

Where c0cn are the child nodes of N. The attribute mapping function iterates the NamedNodeMap:

mapAttrs(N) = { ai.name ai.value | i [0, N.attributes.length) }

Text node handling applies a whitespace filter predicate P:

P(t) = t.trim().length > 0

Where t = the text content of the node. Only nodes satisfying P are included unless the "include whitespace" option is enabled. The total node count in the output is bounded by the recursive relation T(N) = 1 + ki=0 T(ci), where k is the number of children. Time complexity is O(n) where n is total node count.

Reference Data

Node TypenodeType ValueJSON RepresentationIncluded by Default
Element1{ tag, attributes, children }Yes
Text3{ type: "text", content }Yes (non-empty)
Comment8{ type: "comment", content }Optional
CDATA Section4{ type: "cdata", content }Optional
Document9Root wrapperSkipped (children used)
DocumentType10{ type: "doctype", name }Optional
DocumentFragment11Root wrapperSkipped (children used)
Attribute2Merged into parent attributesAlways
Processing Instruction7{ type: "pi", target, data }Optional
Entity Reference5Resolved to textAutomatic
Void Elements (br, img, hr, input)1{ tag, attributes, children: [] }Yes
SVG Elements1Namespace preserved in tagYes
Custom Elements (web components)1Hyphenated tag preservedYes
Template Content11Fragment children extractedOptional
Whitespace-only Text3Filtered outNo (configurable)

Frequently Asked Questions

The browser's native DOMParser follows HTML5 error-recovery rules defined in the WHATWG spec. Unclosed tags are auto-closed, misnested elements are reparented, and unknown tags are treated as valid custom elements. The resulting JSON reflects the corrected tree, not the original source. If you need to detect parse errors, check the output for a element which some browsers inject into malformed XML (but not HTML) mode.
By default, whitespace-only text nodes (containing only spaces, tabs, or newlines) are filtered out. These nodes exist between elements due to HTML source formatting but carry no semantic meaning. Enable the "Include Whitespace Nodes" option to preserve them. Note that entities are non-breaking spaces and will pass the whitespace filter since they are not standard whitespace characters.
When parsing in HTML mode, SVG and MathML elements are recognized and their tag names are preserved in their original case (e.g., viewBox, clipPath). However, explicit namespace URIs are not included in the JSON output by default. The tag name alone is sufficient for most use cases. If you need full namespace data, switch to XML parse mode in the options.
Inputs under 50 KB are parsed synchronously in the main thread with no perceptible delay. For inputs between 50 KB and 500 KB, parsing is offloaded to a Web Worker to prevent UI freezing. Beyond 500 KB, memory pressure from the JSON stringification may cause issues depending on browser and device. For extremely large documents, consider splitting the HTML into fragments before conversion.
Inline event handlers like onclick or onload appear as regular string attributes in the JSON. Script element content appears as a text child node. No code is executed during parsing because DOMParser creates an inert document context. This makes the tool safe for analyzing untrusted HTML. The output is purely structural data with no executable side effects.
Yes. The JSON structure is bidirectional by design. Each element object contains its tag name, full attribute map, and ordered children array. A simple recursive function that creates elements with document.createElement, sets attributes, appends children, and handles text nodes can reconstruct the DOM. However, some information is lost: original whitespace formatting, attribute quote style, self-closing tag syntax, and comment positions may differ from the source.