HTML to Markdown Converter
Convert HTML content to clean, CommonMark-compliant Markdown. Ideal for migrating CMS documentation to static site generators like Jekyll or Hugo.
About
Migrating content from legacy Content Management Systems to modern static site generators often presents significant formatting challenges. Technical writers and developers frequently encounter issues where proprietary HTML classes and inline styles pollute the raw content during export. This tool addresses the specific need for a clean conversion that preserves semantic structure while discarding unnecessary metadata. It provides a reliable method for transforming complex Document Object Model elements into CommonMark syntax without losing data integrity.
The parsing engine handles nested structures and specialized elements that regular expression replacements typically corrupt. It pays particular attention to tabular data by converting standard HTML tables into ASCII pipe syntax. Code blocks retain their language identifiers to ensure syntax highlighting remains functional in the destination environment. This utility serves as a bridge for teams moving documentation from platforms like WordPress to Git-based workflows.
Formulas
The conversion process follows a recursive DOM traversal strategy. We define the transformation function T acting on an HTML node n. For simple text nodes, the function returns the sanitized text content.
For tabular data, the system calculates column widths to align the ASCII pipes. Let Wi be the maximum character width of column i. The separator line is generated by repeating the dash character Wi times.
Reference Data
| HTML Element | Markdown Syntax | Rendered Logic |
|---|---|---|
<h1>Title</h1> | # Title | Top-level heading |
<strong>Bold</strong> | **Bold** | Strong emphasis |
<a href="url">Link</a> | [Link](url) | Hyperlink anchor |
<blockquote>Text</blockquote> | > Text | Block quotation |
<code>var x</code> | `var x` | Inline code |
<pre><code>...</code></pre> | ```...``` | Fenced code block |
<ul><li>Item</li></ul> | * Item | Unordered list |
<hr /> | --- | Horizontal rule |
<table>...</table> | | A | B | | Pipe table structure |