User Rating 0.0
Total Usage 0 times
DOCX to Markdown Converter
Drag & drop your .docx file here or click to browse • Max 50 MB
Is this tool helpful?

Your feedback helps us improve.

About

DOCX files store content as compressed XML (OOXML, ISO/IEC 29500). Converting them to Markdown requires parsing ZIP archives, resolving XML namespaces, and mapping WordprocessingML elements to plain-text syntax. Getting this wrong produces broken formatting: lost headings, collapsed lists, stripped hyperlinks. This tool performs real client-side ZIP decompression and XML tree traversal. No file leaves your browser. It handles w:pStyle mappings for headings (1 - 6), nested w:numPr structures for ordered and unordered lists, w:tbl grids for pipe tables, and inline run properties for bold, italic, strikethrough, and code spans.

Limitations: embedded images are extracted as base64 data URIs, which inflates output size. Complex layouts (text boxes, SmartArt, equations via OMML) are approximated as plain text. Footnotes and endnotes are appended at the document end. The parser assumes well-formed OOXML; files produced by non-Microsoft editors (LibreOffice, Google Docs export) may have non-standard namespace prefixes that this tool normalizes.

docx to markdown word to markdown docx converter markdown converter document converter online docx to md

Formulas

DOCX files conform to the Office Open XML (OOXML) standard. The file is a ZIP archive. The conversion pipeline follows a deterministic sequence:

parse(file) unzip(buffer) extractXML(entries) walkTree(dom) markdown

The ZIP local file header structure defines entry locations:

Offset = 0x04034b50 + headerSize + filenameLen + extraLen

Each compressed entry uses DEFLATE (method 8). Stored entries (method 0) are read directly. The browser's DecompressionStream("deflate-raw") handles inflation natively without external libraries.

Heading level mapping:

level = parseInt(styleName.match(/Heading(\d)/)[1])
prefix = "#".repeat(clamp(level, 1, 6))

List indentation depth:

indent = " ".repeat(ilvl)

Where ilvl = indent level from w:ilvl attribute (0-based). styleName = value of w:pStyle w:val attribute. buffer = raw ArrayBuffer of the uploaded DOCX file. entries = map of filename decompressed Uint8Array. dom = parsed XML document from DOMParser.

Reference Data

OOXML ElementXML PathMarkdown OutputNotes
Heading 1w:pStyle val="Heading1"# TextMapped via pStyle name matching
Heading 2w:pStyle val="Heading2"## TextLevels 1 - 6 supported
Boldw:b / w:b val="true"**text**Handles toggle & explicit
Italicw:i / w:i val="true"*text*Combined: ***text***
Strikethroughw:strike~~text~~GFM extension
Hyperlinkw:hyperlink r:id[text](url)Resolved via .rels file
Unordered Listw:numPr + bullet numFmt- itemIndent via w:ilvl
Ordered Listw:numPr + decimal numFmt1. itemCounter resets per list
Tablew:tblw:trw:tcPipe table | a | b |Alignment from w:jc
Code (Inline)w:rFonts monospace family`code`Courier, Consolas, monospace detection
Block Quotew:pStyle val="Quote"> textAlso IntenseQuote
Horizontal Rulew:pBdr bottom border only---Paragraph border detection
Line Breakw:brTwo trailing spaces + newlineSoft break within paragraph
Page Breakw:br type="page"---Converted to thematic break
Imagew:drawinga:blip r:embed![alt](data:...)Base64 embedded or downloadable
Footnote Refw:footnoteReference[^1]Collected and appended at end
Superscriptw:vertAlign val="superscript"<sup>text</sup>HTML fallback in Markdown
Subscriptw:vertAlign val="subscript"<sub>text</sub>HTML fallback in Markdown
Underlinew:u<u>text</u>No native MD; HTML used
Highlight/Colorw:highlightPlain text (stripped)Color info discarded

Frequently Asked Questions

No. All processing happens entirely in your browser using the File API, native ZIP decompression (DecompressionStream), and DOMParser. Your DOCX never leaves your device. No network requests are made during conversion.
Markdown has no native image embedding format beyond URL references. Since the images exist only inside the DOCX ZIP archive, they are extracted and encoded as base64 data URIs (e.g., data:image/png;base64,...). For large documents with many images, consider downloading the Markdown file and using a post-processor to extract images into separate files.
The converter parses word/numbering.xml to determine list type (bullet vs. decimal) per numId and ilvl combination. Nested levels are indented with two spaces per level. If numbering.xml is missing or malformed (common in Google Docs exports), the converter falls back to unordered list syntax (- item) for all list items.
Markdown pipe tables do not support cell spanning (colspan/rowspan). Merged cells detected via w:gridSpan or w:vMerge are expanded into individual cells with duplicated content. The converter adds an HTML comment to flag these for manual review. For complex table layouts, consider using the HTML table output option instead.
Yes, with caveats. These applications sometimes use non-standard style names (e.g., "heading 1" instead of 'Heading1') or omit the word/ prefix in relationship targets. The converter normalizes style names via case-insensitive matching and resolves relative paths in .rels files. However, proprietary extensions specific to LibreOffice (e.g., lo:custom-style) are ignored.
The converter processes files up to 50 MB. Larger files may cause memory pressure in the browser tab. For documents exceeding this limit, the ZIP parser operates in a streaming fashion where possible, but the full decompressed XML must fit in memory. A typical 50 MB DOCX contains roughly 200-300 pages with embedded images.
Footnote references in the body text are converted to Markdown footnote syntax [^N] where N is the footnote number. The actual footnote content is parsed from word/footnotes.xml and appended at the end of the document as [^N]: content. Endnotes from word/endnotes.xml follow the same pattern. Self-referential or nested footnotes are flattened to a single level.