User Rating 0.0
Total Usage 0 times
Drop TTML file here
Is this tool helpful?

Your feedback helps us improve.

About

TTML (Timed Text Markup Language), defined by W3C as DFXP, encodes subtitle data in XML with a styling model that does not map directly to WebVTT. Incorrect conversion loses italic markers, drops alignment cues, or miscalculates end times when dur must be added to begin rather than treated as an absolute timestamp. This tool parses the full TTML DOM including <styling> blocks and <span> nesting, resolves style references by id, and emits spec-compliant WebVTT with proper cue tags. It handles offset-time formats (12.5s, 500ms) and clock-time with frames. The conversion runs entirely in the browser with no server round-trip. Limitation: TTML region-based positioning is approximated because WebVTT positioning semantics differ from TTML's region model.

ttml webvtt caption converter subtitle converter timed text vtt dfxp closed captions

Formulas

The core timing computation converts a TTML duration-based cue into an absolute end timestamp:

tend = tbegin + tdur

where tbegin is the parsed begin attribute in milliseconds, and tdur is the parsed dur attribute in milliseconds. If an end attribute is present instead, it is used directly as tend.

Timestamp parsing normalizes all TTML time expressions to milliseconds:

parse(clock) H × 3600000 + M × 60000 + S × 1000 + ms

For frame-based timestamps (HH:MM:SS:FF), the frame count F is converted assuming a default frame rate of 30 fps:

ms = F30 × 1000

For offset-time expressions, the numeric value is multiplied by the unit factor: h 3600000, m 60000, s 1000, ms 1.

Style resolution follows a lookup chain: each style attribute value on a <p> or <span> is matched against the id of <style> elements in the TTML <head>. The resolved properties are then mapped to WebVTT cue tags: tts:fontStyle="italic" <i>, tts:fontWeight="bold" <b>, tts:textDecoration="underline" <u>.

Reference Data

TTML FeatureTTML SyntaxWebVTT EquivalentSupport Status
Italic texttts:fontStyle="italic"<i>...</i>Full
Bold texttts:fontWeight="bold"<b>...</b>Full
Underlinetts:textDecoration="underline"<u>...</u>Full
Text alignmenttts:textAlign="left|center|right"align:left|center|rightFull
Font colortts:color="#RRGGBB"<c.colorname> or inlineMapped
Background colortts:backgroundColor<c> with classApproximated
Duration attributedur="00:00:05.000"Computed end timeFull
End attributeend="00:00:10.000"Direct end timeFull
Clock-time formatHH:MM:SS.mmmHH:MM:SS.mmmFull
Clock-time with framesHH:MM:SS:FFConverted to .mmmFull (assumes 30 fps)
Offset-time seconds12.5sConverted to HH:MM:SS.mmmFull
Offset-time milliseconds500msConverted to HH:MM:SS.mmmFull
Offset-time hours2.5hConverted to HH:MM:SS.mmmFull
Offset-time minutes5mConverted to HH:MM:SS.mmmFull
Nested <span>Inline style spansNested WebVTT tagsFull
Line breaks<br />Newline characterFull
Region positioning<region> with origin/extentposition/line settingsApproximated
Font sizetts:fontSizeNot supported in WebVTTDropped
Font familytts:fontFamilyNot supported in WebVTTDropped
Writing modetts:writingModevertical cue settingPartial
Opacitytts:opacityNot supported in WebVTTDropped
Multiple <div> blocksSeparate content divisionsSequential cuesFull

Frequently Asked Questions

When a TTML <p> element has a dur attribute, the converter adds that duration to the begin timestamp to compute the end time. When an end attribute is present, it is used directly. If both are present, end takes precedence. If neither exists, the cue is skipped with a warning.
The converter detects the colon-separated frame component and converts it to milliseconds assuming a default frame rate of 30 fps. For example, 00:01:23:15 becomes 00:01:23.500. If your source uses a different frame rate (24, 25, 29.97), the frame-to-millisecond conversion will be slightly off. Adjust the source timestamps to clock-time format for precision.
Yes. The converter recursively walks child nodes of each <p> element. A <span> referencing a style with tts:fontStyle="italic" will produce <i>...</i> in the WebVTT output. Multiple nesting levels (e.g., bold inside italic) produce nested tags like <i><b>text</b></i>.
WebVTT does not support font-size, font-family, opacity, or background-color with the same fidelity as TTML. These properties are silently dropped. Text alignment (left, center, right) is preserved via the align cue setting. Region-based positioning is approximated but may not match the TTML layout exactly.
Yes. The converter queries all <p> elements across all </p><div> containers within the <body>. Cues are emitted in document order. Each <div> does not create a separate WebVTT file; all cues are merged into a single output.
The tool uses the browser's native DOMParser with an XML content type. If the input contains malformed XML (unclosed tags, invalid entities), the parser will return an error document. The converter detects this and displays a specific error message indicating the parse failure location when available.