User Rating 0.0
Total Usage 0 times
BSON Input
XML Output
Is this tool helpful?

Your feedback helps us improve.

About

BSON (Binary JSON) is the binary-encoded serialization format used by MongoDB and other systems to store documents. Its wire format is compact but opaque. Extracting structured data from raw BSON without a proper parser risks silent data corruption. Type 0x01 is an IEEE 754 double. Type 0x12 is a signed 64-bit integer stored little-endian. Misreading byte order or length prefixes produces garbage. This tool parses the full BSON specification and emits well-formed XML with proper character escaping. It handles nested documents, arrays, ObjectIds, UTC datetimes, regex patterns, and binary subtypes. Limitation: JavaScript lacks native 64-bit integer precision. Values of type Int64 beyond 253 are represented as strings to avoid rounding.

Input is accepted as a hexadecimal string, a base64-encoded string, or a raw .bson file. The converter validates the BSON document length prefix against actual byte length before parsing. Malformed documents produce descriptive error messages referencing the byte offset of failure. Output XML uses a configurable root element name and indentation depth. Array elements are emitted as repeated <item> nodes with an index attribute. Special BSON types carry a bsonType attribute so no information is lost in the conversion.

bson xml converter binary document mongodb data format

Formulas

A BSON document begins with a 4-byte little-endian int32 declaring the total document size in bytes, followed by a sequence of typed elements, and terminates with a 0x00 byte.

document int32 e_list 0x00

Each element in e_list is structured as:

element type1 byte e_namecstring valuetype-dependent

The total byte length is validated against the declared size:

lendeclared = buf[0] + buf[1] × 256 + buf[2] × 65536 + buf[3] × 16777216

Strings are length-prefixed: a 4-byte int32 byte count (including the trailing 0x00), followed by UTF-8 encoded bytes. XML output escapes five reserved characters:

& & , < < , > > , " " , ' '

Where e_name is a cstring (null-terminated sequence of non-zero bytes), used as the XML element tag. Invalid XML name characters are replaced with underscores.

Reference Data

BSON TypeID (Hex)DescriptionXML RepresentationSize (bytes)
Double0x01IEEE 754 floating pointText content8
String0x02UTF-8 stringText content (escaped)4 + len + 1
Document0x03Embedded BSON documentNested child elementsVariable
Array0x04BSON arrayRepeated <item> elementsVariable
Binary0x05Binary data with subtypeBase64 text, subtype attr5 + len
Undefined0x06DeprecatedEmpty element, bsonType attr0
ObjectId0x0712-byte unique IDHex string text12
Boolean0x08true / false"true" or "false" text1
UTC Datetime0x09Milliseconds since epochISO 8601 string8
Null0x0ANull valueEmpty element, bsonType="null"0
Regex0x0BRegular expressionpattern & options attrsVariable
DBPointer0x0CDeprecated DB referenceNamespace + ObjectId text4 + len + 13
JavaScript0x0DJS code stringCDATA text content4 + len + 1
Symbol0x0EDeprecated symbolText content4 + len + 1
Code w/ Scope0x0FJS code + scope docNested code + scope elementsVariable
Int320x1032-bit signed integerInteger text4
Timestamp0x11MongoDB internal timestampincrement + timestamp attrs8
Int640x1264-bit signed integerString text (precision safe)8
Decimal1280x13128-bit decimal floatHex string text16
MinKey0xFFInternal lowest valueEmpty, bsonType="minKey"0
MaxKey0x7FInternal highest valueEmpty, bsonType="maxKey"0

Frequently Asked Questions

JavaScript numbers are IEEE 754 doubles with a safe integer range of 253 to 253. This converter reads all 8 bytes of a BSON Int64 and checks if the value falls outside Number.MAX_SAFE_INTEGER. If it does, the value is represented as a string in the XML output with a bsonType="int64" attribute to prevent silent precision loss.
XML element names must start with a letter or underscore and may contain letters, digits, hyphens, periods, and underscores. The converter sanitizes each BSON field name by replacing invalid characters with underscores. If the name starts with a digit or hyphen, an underscore prefix is added. The original field name is preserved in an originalKey attribute on the element.
Yes. Files produced by mongodump are concatenated BSON documents. This tool parses the first complete document from the binary data. For multi-document dumps, each document has its own length prefix. The converter currently extracts the first document. If the file contains multiple documents, it will parse all of them and wrap them under the root element.
BSON arrays are documents with numeric string keys ("0", "1", "2", ...). The converter emits each array element as an <item> element with an index attribute preserving the ordinal position. This avoids invalid XML element names starting with digits and maintains order semantics.
The tool imposes a 10 MB limit on input data. MongoDB itself limits documents to 16 MB. Browser memory is the practical constraint. Documents under 5 MB convert in under a second. Larger documents trigger a progress indicator during parsing.
Validation is strict. The converter checks the declared document length against the actual byte array length. Every element type byte must be a recognized BSON type. String lengths are bounds-checked. Cstrings must have a null terminator within the remaining buffer. If any check fails, a descriptive error is returned citing the byte offset where parsing failed.
BSON Binary (type 0x05) has a subtype byte: 0x00 (generic), 0x03/0x04 (UUID), 0x05 (MD5), etc. The converter emits the binary data as base64 text and includes a binarySubtype attribute with the subtype's hex value. UUID subtypes are also rendered as a standard UUID string in a uuid attribute for convenience.