Web Page to Outline Converter
Convert any web page into a clean, hierarchical outline. Extract headings, lists, and structure from URLs. Export to Markdown, text, or JSON.
About
Web pages contain implicit document structure through heading elements (h1 through h6), semantic landmarks, and nested lists. Most users never see this structure. This tool fetches a page, parses its DOM tree, and reconstructs the logical outline as a collapsible hierarchy. It applies a depth-first traversal algorithm where each heading of level n creates a new scope that captures all subsequent content until a heading of level ≤ n appears. Malformed heading hierarchies (e.g., jumping from h1 to h4) are normalized but flagged. The output approximates what screen readers and search engine crawlers actually interpret, not what the visual design suggests.
Limitations: this tool cannot access pages behind authentication walls, JavaScript-rendered SPAs that produce no server-side HTML, or sites that block CORS proxy requests. Pages with no heading elements produce a flat paragraph-level outline. Pro tip: compare your outline against a competitor's page to identify structural SEO gaps. A page with a broken heading hierarchy can lose up to 20% of its potential featured-snippet eligibility according to multiple SEO audit studies.
Formulas
The outline extraction follows a deterministic heading-nesting algorithm. For a sequence of heading elements encountered during DOM traversal:
Where hi is the i-th heading element encountered in document order. level(h) returns the numeric heading rank (1 - 6). A stack-based approach maintains the current ancestor chain. When a heading of level n is encountered, the stack pops until the top element has level < n, then pushes the new heading as a child.
Content nodes (paragraphs, lists, images) between headings are attached to the most recent heading scope. The outline depth d for any node satisfies 0 ≤ d ≤ 6. Structural validity is checked by verifying that no heading level is skipped (e.g., h1 → h3 without an intervening h2). Violations are reported but do not prevent outline generation.
Reference Data
| HTML Element | Outline Role | SEO Weight | Expected Count | Common Mistakes |
|---|---|---|---|---|
| h1 | Page Title / Primary Topic | Highest | 1 per page | Multiple h1 tags dilute topic signal |
| h2 | Major Section | High | 2 - 8 | Using for styling instead of structure |
| h3 | Subsection | Medium | 2 - 5 per h2 | Skipping h2 and jumping to h3 |
| h4 | Detail Point | Low-Medium | As needed | Nesting too deep without content |
| h5 | Sub-detail | Low | Rare | Overuse creates visual noise |
| h6 | Minor annotation | Minimal | Very Rare | Almost never needed in practice |
| nav | Navigation Landmark | Structural | 1 - 3 | Missing aria-label on multiple navs |
| main | Primary Content Area | Structural | 1 | Omitting entirely |
| article | Self-contained Content | Medium | Varies | Using div instead of article for posts |
| section | Thematic Grouping | Medium | Varies | Missing heading inside section |
| aside | Tangential Content | Low | 0 - 3 | Placing primary content in aside |
| ul / ol | List Structure | Medium (featured snippets) | Varies | Using br tags instead of proper lists |
| figure | Media with Caption | Low-Medium | Varies | Missing figcaption |
| header | Introductory Content | Structural | 1 - 2 | Confusing with head element |
| footer | Footer Landmark | Structural | 1 | Stuffing SEO links in footer |
| dl | Definition / Key-Value List | Low-Medium | As needed | Rarely used despite being semantically ideal for glossaries |