About

Markdown remains the de facto standard for technical documentation, README files, and content management systems. Yet most content originates in rich-text editors, email clients, or web pages that output HTML. Manual conversion introduces errors: broken link references, lost heading hierarchy, mangled nested lists. This tool parses the HTML DOM tree recursively, mapping each node to its Markdown equivalent using GFM (GitHub Flavored Markdown) rules. It handles edge cases such as nested blockquotes, mixed list types, inline code within headings, and HTML tables with alignment. The converter strips unsafe elements (script, style, event attributes) before processing. Paste rich text directly from Google Docs, Confluence, Notion, or any browser source. Conversion accuracy depends on the semantic quality of the source HTML. Purely visual formatting (e.g., font-size without heading tags) cannot be inferred.

Formulas

The converter operates as a recursive DOM tree walker. Each node N is evaluated by type and tag name, producing a Markdown string M.

convert(N) =

{

N.textContent if N is TEXT_NODErule(N.tagName, children(N)) if N is ELEMENT_NODE"" otherwise

where children(N) = n∑i=0 convert(N.childNodes[i]) concatenates all child conversions. The rule function maps tag names to Markdown wrappers. For example, rule("STRONG", c) = "**" + c + "**". List depth d determines indentation: prefix = " ".repeat(d). Table alignment is read from style.textAlign or align attribute and mapped to separator patterns: :--- (left), :---: (center), ---: (right).

Reference Data

HTML Element	Markdown Output	GFM Extension	Notes
<h1> - <h6>	# to ######	No	ATX-style headings
<strong> / <b>	text	No	Bold wrapping
<em> / <i>	text	No	Italic wrapping
<del> / <s>	~~text~~	Yes	Strikethrough
<a href>	[text](url)	No	Title attr preserved
<img>	![alt](src)	No	Alt text required
<ul> / <li>	- item	No	Nested with 4-space indent
<ol> / <li>	1. item	No	Sequential numbering
<input type=checkbox>	- [x] / - [ ]	Yes	Task lists inside <li>
<blockquote>	> text	No	Nested with >>
<code>	`code`	No	Inline code
<pre><code>	```lang ... ```	Yes	Fenced code blocks; lang from class
<table>	Pipe table	Yes	Alignment via :--- syntax
<hr>	---	No	Thematic break
<br>	Two trailing spaces or <br>	No	Configurable
<p>	Double newline	No	Paragraph separation
<sub>	<sub>text</sub>	No	Passed through as HTML
<sup>	<sup>text</sup>	No	Passed through as HTML
<abbr>	<abbr>text</abbr>	No	No Markdown equivalent
<details>	<details>...</details>	No	HTML passthrough
<mark>	<mark>text</mark>	No	No native Markdown
<kbd>	<kbd>text</kbd>	No	Semantic HTML preserved

Frequently Asked Questions

Markdown has no equivalent for many HTML constructs such as font color, background highlights, custom font sizes, or complex layouts. The converter maps semantic HTML tags (headings, bold, italic, links, lists) to Markdown. Purely presentational CSS styling without semantic tags is discarded. For best results, use source editors that produce clean semantic HTML (Google Docs, Notion, Confluence) rather than visual-only formatters.

Each nesting level adds a 4-space indentation prefix to list items, following the CommonMark specification. Mixed ordered and unordered lists are preserved. However, if the source HTML uses non-standard nesting (e.g., <ul> directly inside <ul> without a parent <li>), the output may produce unexpected indentation. The converter normalizes these cases but cannot infer intent from malformed markup.

Yes. HTML <table> elements are converted to GFM pipe tables with header separators. Column alignment is detected from the align attribute or text-align CSS property on <th> / <td> elements. Colspan and rowspan are not supported in GFM and are flattened. Tables without a <thead> row use the first <tr> as the header.

All <script> and <style> elements are stripped before conversion. Inline event handlers (onclick, onload, etc.) and javascript: URIs are removed. The converter processes only safe, content-bearing elements. This sanitization prevents XSS vectors from appearing in the output.

Yes. Word and Google Docs place rich HTML on the clipboard when you copy. The converter captures this HTML via the paste event. However, Word often produces deeply nested, non-semantic HTML with excessive <span> wrappers and MsoNormal classes. The converter ignores these presentation-only wrappers and extracts the underlying content. Some Word-specific constructs (SmartArt, embedded OLE objects) are not convertible.

If the source HTML uses <pre><code class="language-python"> (or similar class patterns like lang-*, highlight-*), the converter extracts the language identifier and produces a fenced code block with the language hint: ```python. If no language class is found, a plain fenced block is produced. Inline <code> elements become backtick-wrapped inline code.