DOCX to Markdown Converter
Convert DOCX files to clean Markdown syntax instantly in your browser. No upload to servers, no install needed. Supports tables, lists, links, and formatting.
About
DOCX files store content as compressed XML (OOXML, ISO/IEC 29500). Converting them to Markdown requires parsing ZIP archives, resolving XML namespaces, and mapping WordprocessingML elements to plain-text syntax. Getting this wrong produces broken formatting: lost headings, collapsed lists, stripped hyperlinks. This tool performs real client-side ZIP decompression and XML tree traversal. No file leaves your browser. It handles w:pStyle mappings for headings (1 - 6), nested w:numPr structures for ordered and unordered lists, w:tbl grids for pipe tables, and inline run properties for bold, italic, strikethrough, and code spans.
Limitations: embedded images are extracted as base64 data URIs, which inflates output size. Complex layouts (text boxes, SmartArt, equations via OMML) are approximated as plain text. Footnotes and endnotes are appended at the document end. The parser assumes well-formed OOXML; files produced by non-Microsoft editors (LibreOffice, Google Docs export) may have non-standard namespace prefixes that this tool normalizes.
Formulas
DOCX files conform to the Office Open XML (OOXML) standard. The file is a ZIP archive. The conversion pipeline follows a deterministic sequence:
The ZIP local file header structure defines entry locations:
Each compressed entry uses DEFLATE (method 8). Stored entries (method 0) are read directly. The browser's DecompressionStream("deflate-raw") handles inflation natively without external libraries.
Heading level mapping:
List indentation depth:
Where ilvl = indent level from w:ilvl attribute (0-based). styleName = value of w:pStyle w:val attribute. buffer = raw ArrayBuffer of the uploaded DOCX file. entries = map of filename → decompressed Uint8Array. dom = parsed XML document from DOMParser.
Reference Data
| OOXML Element | XML Path | Markdown Output | Notes |
|---|---|---|---|
| Heading 1 | w:pStyle val="Heading1" | # Text | Mapped via pStyle name matching |
| Heading 2 | w:pStyle val="Heading2" | ## Text | Levels 1 - 6 supported |
| Bold | w:b / w:b val="true" | **text** | Handles toggle & explicit |
| Italic | w:i / w:i val="true" | *text* | Combined: ***text*** |
| Strikethrough | w:strike | ~~text~~ | GFM extension |
| Hyperlink | w:hyperlink r:id | [text](url) | Resolved via .rels file |
| Unordered List | w:numPr + bullet numFmt | - item | Indent via w:ilvl |
| Ordered List | w:numPr + decimal numFmt | 1. item | Counter resets per list |
| Table | w:tbl → w:tr → w:tc | Pipe table | a | b | | Alignment from w:jc |
| Code (Inline) | w:rFonts monospace family | `code` | Courier, Consolas, monospace detection |
| Block Quote | w:pStyle val="Quote" | > text | Also IntenseQuote |
| Horizontal Rule | w:pBdr bottom border only | --- | Paragraph border detection |
| Line Break | w:br | Two trailing spaces + newline | Soft break within paragraph |
| Page Break | w:br type="page" | --- | Converted to thematic break |
| Image | w:drawing → a:blip r:embed |  | Base64 embedded or downloadable |
| Footnote Ref | w:footnoteReference | [^1] | Collected and appended at end |
| Superscript | w:vertAlign val="superscript" | <sup>text</sup> | HTML fallback in Markdown |
| Subscript | w:vertAlign val="subscript" | <sub>text</sub> | HTML fallback in Markdown |
| Underline | w:u | <u>text</u> | No native MD; HTML used |
| Highlight/Color | w:highlight | Plain text (stripped) | Color info discarded |