User Rating 0.0
Total Usage 0 times

Drop your WordPress XML export file here

or click to browse · .xml files only

Is this tool helpful?

Your feedback helps us improve.

About

WordPress WXR (WordPress eXtended RSS) export files use a namespaced XML schema containing item nodes with content encoded in content:encoded CDATA blocks. Migrating to Octopress (a Jekyll-based framework) requires each post to become a discrete Markdown file named YYYY-MM-DD-slug.markdown with YAML front matter specifying layout, title, date, categories, and comments fields. Getting the date format wrong breaks Jekyll's build. Misescaped YAML colons in titles cause silent parse failures that surface only at deploy time. This tool parses the raw WXR XML using the browser's native DOMParser, walks each item node to extract WordPress-specific namespaced elements, converts embedded HTML content to clean Markdown via recursive DOM traversal, and packages the result as downloadable files. It handles draft vs. published status, preserves category and tag taxonomies, and strips WordPress shortcodes that have no Octopress equivalent. The conversion assumes UTF-8 encoding and standard WXR 1.2 schema. Custom post types and attachments are excluded by default.

wordpress octopress jekyll xml markdown blog migration wxr converter

Formulas

The conversion pipeline follows a deterministic transformation sequence from WXR XML to Octopress-compatible Markdown files.

Parse(WXR) Filter(post_type = post) Extract(metadata + content) HTMLtoMD(content) YAML(front matter) File(date-slug.markdown)

The filename generation follows the Octopress convention:

filename = format(post_date, YYYY-MM-DD) + "-" + slugify(post_name) + ".markdown"

YAML title escaping rule prevents parser breakage:

safe_title =
{
""" + title + """ if title contains ":" or "#" or """title otherwise

Where WXR = WordPress eXtended RSS export format (XML), post_type = WordPress content type discriminator, post_date = publication timestamp from wp:post_date element, post_name = URL slug from wp:post_name element, slugify = lowercase alphanumeric + hyphen normalization function, HTMLtoMD = recursive DOM tree walker converting HTML nodes to Markdown syntax.

Reference Data

WordPress WXR ElementOctopress Front MatterNotes
titletitleWrapped in quotes if contains colons
wp:post_datedateFormat: YYYY-MM-DD HH:MM
content:encodedBody (below ---)HTML converted to Markdown
excerpt:encodeddescriptionPlain text, truncated to 160 chars
wp:status = publishpublished: trueDrafts set to FALSE
wp:status = draftpublished: falseFile still generated
category domain="category"categoriesYAML array format
category domain="post_tag"tags (custom)Optional inclusion
wp:post_nameFilename slugSanitized to lowercase alphanumeric + hyphens
wp:post_type = postIncludedOnly post type processed
wp:post_type = pageOptionalToggled via checkbox
wp:post_type = attachmentExcludedMedia files not downloadable from XML
wp:commentExcludedOctopress uses Disqus; comments not migrated
wp:post_idNot mappedWordPress internal ID discarded
dc:creatorauthorIncluded in front matter
linkNot mappedOriginal URL preserved in comment only
wp:meta_keyExcludedCustom fields not portable
HTML <a>Markdown [text](url)Relative URLs preserved
HTML <img>Markdown ![alt](src)WordPress media URLs kept as-is
HTML <pre><code>Fenced code block ```Language detection not attempted
HTML <blockquote>Markdown > prefixNested blockquotes supported
WordPress [shortcode]StrippedNo Octopress equivalent; logged in console
HTML <table>Raw HTML preservedMarkdown tables unreliable for complex layouts

Frequently Asked Questions

WordPress shortcodes have no equivalent in Octopress or Jekyll. The converter strips all content matching the pattern [shortcode attr="val"]...[/shortcode] and logs each removed shortcode to the conversion report. If a shortcode wraps meaningful content (like [caption] around an image), the inner content is preserved but the shortcode wrapper is removed. Review the conversion log to identify posts requiring manual shortcode replacement.
YAML front matter treats colons as key-value delimiters. A title like Node.js: A Guide would break parsing if left unquoted. The converter detects titles containing colons (:), hash symbols (#), square brackets, or existing quotes and wraps them in double quotes with internal quotes escaped. This follows the YAML 1.2 specification for scalar quoting.
The converter processes the XML in the browser's main thread using DOMParser, which loads the entire document into memory. For exports exceeding approximately 100 MB, browser memory limits may cause failures. WordPress splits large exports into multiple files by default (typically at the 15 MB boundary). Process each split file individually. The converter displays a file size warning above 50 MB.
Yes. Draft posts are converted with published: false in their YAML front matter. Octopress and Jekyll will not render these posts during build unless you pass the --unpublished flag. You can toggle draft inclusion off in the converter settings to exclude them entirely from the output.
Image tags (<img>) are converted to Markdown image syntax ![alt](src). The src URL is preserved as-is from the WordPress export, meaning it still points to your original WordPress media uploads directory (typically /wp-content/uploads/YYYY/MM/). If your WordPress site is offline, these links will break. You must manually download and re-host images, then find-and-replace the URLs in the generated files.
The converter uses the wp:post_date element which stores local time without timezone offset, formatted as YYYY-MM-DD HH:MM:SS. The output front matter uses YYYY-MM-DD HH:MM format (seconds dropped). If your WordPress installation was configured for a non-UTC timezone, the times reflect that local timezone. Octopress treats dates as local time by default. No timezone conversion is applied.
No. Page builders (Elementor, WPBakery, Divi) inject deeply nested <div> structures with proprietary CSS classes. The converter strips all <div> wrappers and extracts only semantic content elements (<p>, <a>, <img>, headings, lists, code blocks). Posts built entirely with page builders will produce structurally correct but visually simplified Markdown. Manual review is recommended for such posts.