User Rating 0.0
Total Usage 0 times
Paste rich text here Ctrl+V / ⌘V from any source
Is this tool helpful?

Your feedback helps us improve.

About

Markdown remains the de facto standard for technical documentation, README files, and content management systems. Yet most content originates in rich-text editors, email clients, or web pages that output HTML. Manual conversion introduces errors: broken link references, lost heading hierarchy, mangled nested lists. This tool parses the HTML DOM tree recursively, mapping each node to its Markdown equivalent using GFM (GitHub Flavored Markdown) rules. It handles edge cases such as nested blockquotes, mixed list types, inline code within headings, and HTML tables with alignment. The converter strips unsafe elements (script, style, event attributes) before processing. Paste rich text directly from Google Docs, Confluence, Notion, or any browser source. Conversion accuracy depends on the semantic quality of the source HTML. Purely visual formatting (e.g., font-size without heading tags) cannot be inferred.

rich text to markdown html to markdown paste to markdown markdown converter clipboard to markdown GFM converter

Formulas

The converter operates as a recursive DOM tree walker. Each node N is evaluated by type and tag name, producing a Markdown string M.

convert(N) =
{
N.textContent if N is TEXT_NODErule(N.tagName, children(N)) if N is ELEMENT_NODE"" otherwise

where children(N) = ni=0 convert(N.childNodes[i]) concatenates all child conversions. The rule function maps tag names to Markdown wrappers. For example, rule("STRONG", c) = "**" + c + "**". List depth d determines indentation: prefix = " ".repeat(d). Table alignment is read from style.textAlign or align attribute and mapped to separator patterns: :--- (left), :---: (center), ---: (right).

Reference Data

HTML ElementMarkdown OutputGFM ExtensionNotes
<h1> - <h6># to ###### NoATX-style headings
<strong> / <b>**text**NoBold wrapping
<em> / <i>*text*NoItalic wrapping
<del> / <s>~~text~~YesStrikethrough
<a href>[text](url)NoTitle attr preserved
<img>![alt](src)NoAlt text required
<ul> / <li>- itemNoNested with 4-space indent
<ol> / <li>1. itemNoSequential numbering
<input type=checkbox>- [x] / - [ ]YesTask lists inside <li>
<blockquote>> textNoNested with >>
<code>`code`NoInline code
<pre><code>```lang ... ```YesFenced code blocks; lang from class
<table>Pipe tableYesAlignment via :--- syntax
<hr>---NoThematic break
<br>Two trailing spaces or <br>NoConfigurable
<p>Double newlineNoParagraph separation
<sub><sub>text</sub>NoPassed through as HTML
<sup><sup>text</sup>NoPassed through as HTML
<abbr><abbr>text</abbr>NoNo Markdown equivalent
<details><details>...</details>NoHTML passthrough
<mark><mark>text</mark>NoNo native Markdown
<kbd><kbd>text</kbd>NoSemantic HTML preserved

Frequently Asked Questions

Markdown has no equivalent for many HTML constructs such as font color, background highlights, custom font sizes, or complex layouts. The converter maps semantic HTML tags (headings, bold, italic, links, lists) to Markdown. Purely presentational CSS styling without semantic tags is discarded. For best results, use source editors that produce clean semantic HTML (Google Docs, Notion, Confluence) rather than visual-only formatters.
Each nesting level adds a 4-space indentation prefix to list items, following the CommonMark specification. Mixed ordered and unordered lists are preserved. However, if the source HTML uses non-standard nesting (e.g., <ul> directly inside <ul> without a parent <li>), the output may produce unexpected indentation. The converter normalizes these cases but cannot infer intent from malformed markup.
Yes. HTML <table> elements are converted to GFM pipe tables with header separators. Column alignment is detected from the align attribute or text-align CSS property on <th> / <td> elements. Colspan and rowspan are not supported in GFM and are flattened. Tables without a <thead> row use the first <tr> as the header.
All <script> and <style> elements are stripped before conversion. Inline event handlers (onclick, onload, etc.) and javascript: URIs are removed. The converter processes only safe, content-bearing elements. This sanitization prevents XSS vectors from appearing in the output.
Yes. Word and Google Docs place rich HTML on the clipboard when you copy. The converter captures this HTML via the paste event. However, Word often produces deeply nested, non-semantic HTML with excessive <span> wrappers and MsoNormal classes. The converter ignores these presentation-only wrappers and extracts the underlying content. Some Word-specific constructs (SmartArt, embedded OLE objects) are not convertible.
If the source HTML uses <pre><code class="language-python"> (or similar class patterns like lang-*, highlight-*), the converter extracts the language identifier and produces a fenced code block with the language hint: ```python. If no language class is found, a plain fenced block is produced. Inline <code> elements become backtick-wrapped inline code.