About

Migrating content from legacy Content Management Systems to modern static site generators often presents significant formatting challenges. Technical writers and developers frequently encounter issues where proprietary HTML classes and inline styles pollute the raw content during export. This tool addresses the specific need for a clean conversion that preserves semantic structure while discarding unnecessary metadata. It provides a reliable method for transforming complex Document Object Model elements into CommonMark syntax without losing data integrity.

The parsing engine handles nested structures and specialized elements that regular expression replacements typically corrupt. It pays particular attention to tabular data by converting standard HTML tables into ASCII pipe syntax. Code blocks retain their language identifiers to ensure syntax highlighting remains functional in the destination environment. This utility serves as a bridge for teams moving documentation from platforms like WordPress to Git-based workflows.

Formulas

The conversion process follows a recursive DOM traversal strategy. We define the transformation function T acting on an HTML node n. For simple text nodes, the function returns the sanitized text content.

{

content if n is text[Tinner]href if n is <a># Tinner if n is <h1>

For tabular data, the system calculates column widths to align the ASCII pipes. Let W_i be the maximum character width of column i. The separator line is generated by repeating the dash character W_i times.

Reference Data

HTML Element	Markdown Syntax	Rendered Logic
`<h1>Title</h1>`	`# Title`	Top-level heading
`<strong>Bold</strong>`	`Bold`	Strong emphasis
`<a href="url">Link</a>`	`[Link](url)`	Hyperlink anchor
`<blockquote>Text</blockquote>`	`> Text`	Block quotation
`<code>var x</code>`	`var x`	Inline code
`<pre><code>...</code></pre>`	```...```	Fenced code block
`<ul><li>Item</li></ul>`	`* Item`	Unordered list
`<hr />`	`---`	Horizontal rule
`<table>...</table>`	`\| A \| B \| \|---\|---\|`	Pipe table structure

Frequently Asked Questions

The converter parses the table rows and cells to construct a GitHub Flavored Markdown (GFM) table. It calculates the necessary padding for each column to ensure the ASCII pipe output is visually aligned in the source text. It supports `thead`, `tbody`, and `th` elements but may simplify complex rowspan or colspan attributes which generally do not have direct Markdown equivalents.

No. The tool uses a secure `DOMParser` to read the input string. While it parses the HTML structure to traverse nodes, it does not execute embedded JavaScript or load external resources like images or scripts during the conversion process. This ensures that pasting untrusted HTML code into the input area remains safe for the user.

Markdown is designed to be content-centric rather than style-centric. Consequently, standard attributes like `class`, `id`, and `style` are stripped during conversion to produce clean, portable text. The only exception is often the class on `code` blocks (e.g., `class="language-js"`), which is preserved for syntax highlighting purposes in the output.

The converter preserves the `href` attribute exactly as it appears in the source HTML. If your HTML uses relative paths (e.g., `/docs/page`), the Markdown will reflect that. There is an option provided in the settings to force absolute URLs if a base domain is provided, ensuring that links remain valid when the content is moved to a new repository or domain.