About

Web pages contain implicit document structure through heading elements (h1 through h6), semantic landmarks, and nested lists. Most users never see this structure. This tool fetches a page, parses its DOM tree, and reconstructs the logical outline as a collapsible hierarchy. It applies a depth-first traversal algorithm where each heading of level n creates a new scope that captures all subsequent content until a heading of level ≤ n appears. Malformed heading hierarchies (e.g., jumping from h1 to h4) are normalized but flagged. The output approximates what screen readers and search engine crawlers actually interpret, not what the visual design suggests.

Limitations: this tool cannot access pages behind authentication walls, JavaScript-rendered SPAs that produce no server-side HTML, or sites that block CORS proxy requests. Pages with no heading elements produce a flat paragraph-level outline. Pro tip: compare your outline against a competitor's page to identify structural SEO gaps. A page with a broken heading hierarchy can lose up to 20% of its potential featured-snippet eligibility according to multiple SEO audit studies.

Formulas

The outline extraction follows a deterministic heading-nesting algorithm. For a sequence of heading elements encountered during DOM traversal:

nest(h_i) =

{

child of current scope if level(h_i) > level(h_i−1)sibling of current scope if level(h_i) = level(h_i−1)ancestor pop to matching level if level(h_i) < level(h_i−1)

Where h_i is the i-th heading element encountered in document order. level(h) returns the numeric heading rank (1 - 6). A stack-based approach maintains the current ancestor chain. When a heading of level n is encountered, the stack pops until the top element has level < n, then pushes the new heading as a child.

Content nodes (paragraphs, lists, images) between headings are attached to the most recent heading scope. The outline depth d for any node satisfies 0 ≤ d ≤ 6. Structural validity is checked by verifying that no heading level is skipped (e.g., h1 → h3 without an intervening h2). Violations are reported but do not prevent outline generation.

Reference Data

HTML Element	Outline Role	SEO Weight	Expected Count	Common Mistakes
h1	Page Title / Primary Topic	Highest	1 per page	Multiple h1 tags dilute topic signal
h2	Major Section	High	2 - 8	Using for styling instead of structure
h3	Subsection	Medium	2 - 5 per h2	Skipping h2 and jumping to h3
h4	Detail Point	Low-Medium	As needed	Nesting too deep without content
h5	Sub-detail	Low	Rare	Overuse creates visual noise
h6	Minor annotation	Minimal	Very Rare	Almost never needed in practice
nav	Navigation Landmark	Structural	1 - 3	Missing aria-label on multiple navs
main	Primary Content Area	Structural	1	Omitting entirely
article	Self-contained Content	Medium	Varies	Using div instead of article for posts
section	Thematic Grouping	Medium	Varies	Missing heading inside section
aside	Tangential Content	Low	0 - 3	Placing primary content in aside
ul / ol	List Structure	Medium (featured snippets)	Varies	Using br tags instead of proper lists
figure	Media with Caption	Low-Medium	Varies	Missing figcaption
header	Introductory Content	Structural	1 - 2	Confusing with head element
footer	Footer Landmark	Structural	1	Stuffing SEO links in footer
dl	Definition / Key-Value List	Low-Medium	As needed	Rarely used despite being semantically ideal for glossaries

Frequently Asked Questions

Many modern web applications render content entirely via JavaScript (React, Vue, Angular SPAs). The initial HTML response contains only a shell div and script tags. Since this tool parses the server-delivered HTML without executing JavaScript, no heading elements exist in the raw markup. This also applies to pages using Shadow DOM. Try the tool on server-rendered or static pages for best results.

The tool flags these as structural warnings. Skipped levels violate the W3C outline algorithm specification. The outline is still generated by treating the h4 as a child of the h1 scope, but the warning indicates that screen readers and crawlers may misinterpret the content hierarchy. WCAG 2.1 Success Criterion 1.3.1 requires meaningful sequence, which skipped headings can break.

The tool attempts to fetch the target URL through a sequence of public CORS proxy services (allorigins, corsproxy.io). These proxies relay the HTTP request to bypass browser same-origin restrictions. Your URL is sent to these third-party services as a query parameter. No authentication tokens or cookies from the target site are forwarded. For sensitive internal URLs, consider copying the page source HTML directly into the manual input field instead.

Indirectly. Pages with well-structured heading hierarchies, proper list elements (ol/ul), and table markup are more likely to qualify for featured snippets. If your outline shows a clean h1 → h2 → h3 cascade with list items under relevant headings, the structure supports snippet eligibility. A flat outline with no nesting suggests poor structural optimization.

The Heading List is a flat, sequential dump of every heading tag in document order. The Document Outline reconstructs the implied nesting hierarchy using the heading-level algorithm. A page with h1, h2, h2, h3 produces a flat list of four items but an outline tree where the h3 is nested under the second h2. The outline view reveals structural intent; the flat list reveals structural errors.

Certain websites actively block known proxy IP ranges, return CAPTCHAs, or require specific headers (User-Agent, Referer) that proxies strip. Government sites, banking portals, and major platforms with aggressive bot detection commonly trigger these blocks. The tool provides a manual HTML paste fallback for these cases. Paste the page source (Ctrl+U in most browsers) into the text area to bypass network restrictions entirely.