User Rating 0.0
Total Usage 0 times
Enter a full URL including https://
Recent Analyses
    Is this tool helpful?

    Your feedback helps us improve.

    About

    Web pages contain implicit document structure through heading elements (h1 through h6), semantic landmarks, and nested lists. Most users never see this structure. This tool fetches a page, parses its DOM tree, and reconstructs the logical outline as a collapsible hierarchy. It applies a depth-first traversal algorithm where each heading of level n creates a new scope that captures all subsequent content until a heading of level n appears. Malformed heading hierarchies (e.g., jumping from h1 to h4) are normalized but flagged. The output approximates what screen readers and search engine crawlers actually interpret, not what the visual design suggests.

    Limitations: this tool cannot access pages behind authentication walls, JavaScript-rendered SPAs that produce no server-side HTML, or sites that block CORS proxy requests. Pages with no heading elements produce a flat paragraph-level outline. Pro tip: compare your outline against a competitor's page to identify structural SEO gaps. A page with a broken heading hierarchy can lose up to 20% of its potential featured-snippet eligibility according to multiple SEO audit studies.

    web page outline html structure extractor heading hierarchy page outline generator url to outline web scraper content extractor markdown outline

    Formulas

    The outline extraction follows a deterministic heading-nesting algorithm. For a sequence of heading elements encountered during DOM traversal:

    nest(hi) =
    {
    child of current scope if level(hi) > level(hi1)sibling of current scope if level(hi) = level(hi1)ancestor pop to matching level if level(hi) < level(hi1)

    Where hi is the i-th heading element encountered in document order. level(h) returns the numeric heading rank (1 - 6). A stack-based approach maintains the current ancestor chain. When a heading of level n is encountered, the stack pops until the top element has level < n, then pushes the new heading as a child.

    Content nodes (paragraphs, lists, images) between headings are attached to the most recent heading scope. The outline depth d for any node satisfies 0 d 6. Structural validity is checked by verifying that no heading level is skipped (e.g., h1 h3 without an intervening h2). Violations are reported but do not prevent outline generation.

    Reference Data

    HTML ElementOutline RoleSEO WeightExpected CountCommon Mistakes
    h1Page Title / Primary TopicHighest1 per pageMultiple h1 tags dilute topic signal
    h2Major SectionHigh2 - 8Using for styling instead of structure
    h3SubsectionMedium2 - 5 per h2Skipping h2 and jumping to h3
    h4Detail PointLow-MediumAs neededNesting too deep without content
    h5Sub-detailLowRareOveruse creates visual noise
    h6Minor annotationMinimalVery RareAlmost never needed in practice
    navNavigation LandmarkStructural1 - 3Missing aria-label on multiple navs
    mainPrimary Content AreaStructural1Omitting entirely
    articleSelf-contained ContentMediumVariesUsing div instead of article for posts
    sectionThematic GroupingMediumVariesMissing heading inside section
    asideTangential ContentLow0 - 3Placing primary content in aside
    ul / olList StructureMedium (featured snippets)VariesUsing br tags instead of proper lists
    figureMedia with CaptionLow-MediumVariesMissing figcaption
    headerIntroductory ContentStructural1 - 2Confusing with head element
    footerFooter LandmarkStructural1Stuffing SEO links in footer
    dlDefinition / Key-Value ListLow-MediumAs neededRarely used despite being semantically ideal for glossaries

    Frequently Asked Questions

    Many modern web applications render content entirely via JavaScript (React, Vue, Angular SPAs). The initial HTML response contains only a shell div and script tags. Since this tool parses the server-delivered HTML without executing JavaScript, no heading elements exist in the raw markup. This also applies to pages using Shadow DOM. Try the tool on server-rendered or static pages for best results.
    The tool flags these as structural warnings. Skipped levels violate the W3C outline algorithm specification. The outline is still generated by treating the h4 as a child of the h1 scope, but the warning indicates that screen readers and crawlers may misinterpret the content hierarchy. WCAG 2.1 Success Criterion 1.3.1 requires meaningful sequence, which skipped headings can break.
    The tool attempts to fetch the target URL through a sequence of public CORS proxy services (allorigins, corsproxy.io). These proxies relay the HTTP request to bypass browser same-origin restrictions. Your URL is sent to these third-party services as a query parameter. No authentication tokens or cookies from the target site are forwarded. For sensitive internal URLs, consider copying the page source HTML directly into the manual input field instead.
    Indirectly. Pages with well-structured heading hierarchies, proper list elements (ol/ul), and table markup are more likely to qualify for featured snippets. If your outline shows a clean h1 → h2 → h3 cascade with list items under relevant headings, the structure supports snippet eligibility. A flat outline with no nesting suggests poor structural optimization.
    The Heading List is a flat, sequential dump of every heading tag in document order. The Document Outline reconstructs the implied nesting hierarchy using the heading-level algorithm. A page with h1, h2, h2, h3 produces a flat list of four items but an outline tree where the h3 is nested under the second h2. The outline view reveals structural intent; the flat list reveals structural errors.
    Certain websites actively block known proxy IP ranges, return CAPTCHAs, or require specific headers (User-Agent, Referer) that proxies strip. Government sites, banking portals, and major platforms with aggressive bot detection commonly trigger these blocks. The tool provides a manual HTML paste fallback for these cases. Paste the page source (Ctrl+U in most browsers) into the text area to bypass network restrictions entirely.