User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Supports Base64 and percent-encoded Data URIs
Presets:
MIME Type: โ€”
Encoding: โ€”
Decoded Size: โ€”
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

Data URIs embed file content directly into documents using the data: scheme, encoding binary or text as Base64 or percent-encoded strings. Malformed decoding corrupts special characters - รฉ becomes รƒยฉ, breaking internationalized content. This tool parses the URI structure, detects encoding type (Base64 flag ;base64 or implicit percent-encoding), extracts the MIME type, and reconstructs the original UTF-8 byte sequence using the TextDecoder API. Multi-byte characters (Chinese, Arabic, emoji) require proper byte-level reconstruction, not naive character-by-character conversion.

Common failure modes: truncated Base64 padding causes InvalidCharacterError, unescaped + symbols decode as spaces in URL contexts, and BOM markers (0xEF 0xBB 0xBF) may prefix UTF-8 payloads. The tool handles these edge cases and displays the extracted MIME type for verification. Useful for debugging embedded images in CSS, decoding API responses, or extracting text from data URLs in email templates.

data uri utf8 base64 decoder url decoder mime type

Formulas

Data URI structure follows RFC 2397 specification. The parser extracts components using pattern matching:

dataURI = "data:" + [mimeType] + [";base64"] + "," + payload

where mimeType defaults to text/plain;charset=US-ASCII if omitted. Base64 decoding converts the ASCII payload to binary bytes:

bytes = atob(payload) โ†’ charCodeAt(i) โˆ€ i โˆˆ [0, n)

UTF-8 reconstruction processes the byte array through TextDecoder:

utf8String = new TextDecoder("utf-8").decode(new Uint8Array(bytes))

For percent-encoded URIs (without ;base64 flag), decoding applies the inverse transformation:

decoded = decodeURIComponent(payload)

Multi-byte UTF-8 sequences follow the encoding pattern where byte count n is determined by leading bits:

n = 1 if byte < 0x80, n = 2 if byte < 0xE0, n = 3 if byte < 0xF0, n = 4 otherwise

Reference Data

MIME TypeCommon UseTypical EncodingMax Practical Size
text/plainPlain text embeddingBase64 / Percent~2KB (URL limits)
text/htmlInline HTML framesBase64~32KB
text/cssEmbedded stylesheetsBase64~64KB
text/javascriptInline scriptsBase64~32KB
application/jsonConfiguration dataBase64 / Percent~128KB
application/xmlSVG, config filesBase64~64KB
image/svg+xmlVector graphicsBase64 / Percent~96KB
image/pngRaster imagesBase64~128KB
image/jpegPhotosBase64~96KB
image/gifAnimationsBase64~64KB
image/webpModern imagesBase64~128KB
font/woffWeb fontsBase64~256KB
font/woff2Compressed fontsBase64~192KB
audio/mpegSound effectsBase64~512KB
audio/wavUncompressed audioBase64~256KB
video/mp4Short clipsBase64~1MB
application/pdfDocumentsBase64~512KB
application/octet-streamBinary fallbackBase64~256KB
text/csvSpreadsheet dataBase64 / Percent~64KB
application/x-www-form-urlencodedForm dataPercent~8KB

Frequently Asked Questions

Corruption occurs when Base64 payloads are decoded character-by-character instead of byte-by-byte. UTF-8 uses multi-byte sequences for non-ASCII characters. The character รฉ requires bytes 0xC3 0xA9. Naive decoding treats each byte as a separate character, producing รƒยฉ. This tool reconstructs the proper byte array before applying TextDecoder, preserving multi-byte sequences correctly.
The detection relies on the presence of the ;base64 flag before the comma separator. Data URIs structured as data:text/plain;base64,SGVsbG8= use Base64. URIs like data:text/plain,Hello%20World use percent encoding. If the flag is absent, percent decoding via decodeURIComponent is applied. Some edge cases exist where payload content could superficially resemble Base64 without the flag - these decode as literal percent-encoded text.
Base64 requires padding to align the payload to a multiple of 4 characters. Truncated URIs may lack trailing = padding characters, causing atob() to fail. Additionally, URL-safe Base64 variants use - and _ instead of + and /, which standard atob() rejects. This tool normalizes URL-safe characters and adds missing padding before decoding, handling both variants transparently.
The tool decodes the binary payload but displays the result as UTF-8 text. For image/png or application/pdf payloads, the output appears as garbled characters because binary data is not valid UTF-8 text. The extracted MIME type indicates when binary content is detected. For actual file extraction, a dedicated binary-to-file converter would be required.
Some text editors prepend the UTF-8 Byte Order Mark (bytes 0xEF 0xBB 0xBF, displayed as รฏยปยฟ or a single invisible character) when saving files. If this BOM-prefixed content was encoded into a Data URI, the marker persists in the decoded output. This is not an error - it reflects the original source encoding. Remove the first three bytes or character if clean output is required.
Browser JavaScript string limits allow approximately 512MB of text data. However, practical limits are lower due to memory constraints. Data URIs over 2MB may cause sluggish UI response. The tool processes synchronously without Web Workers, so payloads exceeding 5MB are not recommended. For extremely large URIs, chunk-based processing or server-side tools would be more appropriate.
For Base64-encoded payloads, the charset parameter is informational - decoding always produces raw bytes, and TextDecoder interprets them as UTF-8 regardless of declared charset. For percent-encoded payloads, decodeURIComponent assumes UTF-8 encoding per the URL specification. If the original content used a different encoding (ISO-8859-1, Shift_JIS), character mapping errors may occur. This tool explicitly targets UTF-8 content.