WordPress XML to Octopress Converter
Convert WordPress WXR XML export files to Octopress/Jekyll Markdown posts with YAML front matter. Batch download as ZIP.
Drop your WordPress XML export file here
or click to browse · .xml files only
Conversion Settings
Processing...
About
WordPress WXR (WordPress eXtended RSS) export files use a namespaced XML schema containing item nodes with content encoded in content:encoded CDATA blocks. Migrating to Octopress (a Jekyll-based framework) requires each post to become a discrete Markdown file named YYYY-MM-DD-slug.markdown with YAML front matter specifying layout, title, date, categories, and comments fields. Getting the date format wrong breaks Jekyll's build. Misescaped YAML colons in titles cause silent parse failures that surface only at deploy time. This tool parses the raw WXR XML using the browser's native DOMParser, walks each item node to extract WordPress-specific namespaced elements, converts embedded HTML content to clean Markdown via recursive DOM traversal, and packages the result as downloadable files. It handles draft vs. published status, preserves category and tag taxonomies, and strips WordPress shortcodes that have no Octopress equivalent. The conversion assumes UTF-8 encoding and standard WXR 1.2 schema. Custom post types and attachments are excluded by default.
Formulas
The conversion pipeline follows a deterministic transformation sequence from WXR XML to Octopress-compatible Markdown files.
The filename generation follows the Octopress convention:
YAML title escaping rule prevents parser breakage:
Where WXR = WordPress eXtended RSS export format (XML), post_type = WordPress content type discriminator, post_date = publication timestamp from wp:post_date element, post_name = URL slug from wp:post_name element, slugify = lowercase alphanumeric + hyphen normalization function, HTMLtoMD = recursive DOM tree walker converting HTML nodes to Markdown syntax.
Reference Data
| WordPress WXR Element | Octopress Front Matter | Notes |
|---|---|---|
| title | title | Wrapped in quotes if contains colons |
| wp:post_date | date | Format: YYYY-MM-DD HH:MM |
| content:encoded | Body (below ---) | HTML converted to Markdown |
| excerpt:encoded | description | Plain text, truncated to 160 chars |
| wp:status = publish | published: true | Drafts set to FALSE |
| wp:status = draft | published: false | File still generated |
| category domain="category" | categories | YAML array format |
| category domain="post_tag" | tags (custom) | Optional inclusion |
| wp:post_name | Filename slug | Sanitized to lowercase alphanumeric + hyphens |
| wp:post_type = post | Included | Only post type processed |
| wp:post_type = page | Optional | Toggled via checkbox |
| wp:post_type = attachment | Excluded | Media files not downloadable from XML |
| wp:comment | Excluded | Octopress uses Disqus; comments not migrated |
| wp:post_id | Not mapped | WordPress internal ID discarded |
| dc:creator | author | Included in front matter |
| link | Not mapped | Original URL preserved in comment only |
| wp:meta_key | Excluded | Custom fields not portable |
| HTML <a> | Markdown [text](url) | Relative URLs preserved |
| HTML <img> | Markdown  | WordPress media URLs kept as-is |
| HTML <pre><code> | Fenced code block ``` | Language detection not attempted |
| HTML <blockquote> | Markdown > prefix | Nested blockquotes supported |
| WordPress [shortcode] | Stripped | No Octopress equivalent; logged in console |
| HTML <table> | Raw HTML preserved | Markdown tables unreliable for complex layouts |