User Rating 0.0
Total Usage 0 times
GChemPaint XML Input
Drop .gchp file here or click to browse Max 5 MB
JSON Output
Converted JSON will appear here...
Is this tool helpful?

Your feedback helps us improve.

About

GChemPaint stores molecular structures in an XML dialect that encodes atoms (element symbol, 2D coordinates x, y, charge q) and bonds (order n, connectivity pairs). Downstream toolchains - visualization libraries, cheminformatics pipelines, web renderers - almost universally expect JSON input. Manual transcription of even a modest molecule like caffeine (14 atoms, 17 bonds) is error-prone: a single transposed atom index silently corrupts the topology graph. This converter parses the full GChemPaint XML DOM, resolves internal atom-ID references to zero-based indices, and emits a clean JSON document ready for programmatic consumption.

The parser handles multi-molecule documents, bond stereochemistry annotations (wedge, dash, hash), isotope mass numbers, and formal charges. It approximates standard GChemPaint export conventions. Note: 3D coordinates are not stored in GChemPaint files. Only 2D layout positions are extracted. Pro tip: validate your source XML in GChemPaint before converting to catch orphaned bond references that would produce index −1 in the output.

gchempaint gchp xml to json chemical structure molecule converter chemistry file converter gchempaint json

Formulas

The converter performs a deterministic mapping from XML DOM nodes to JSON objects. No mathematical transformation is applied to coordinates; they are extracted verbatim. The core logic resolves bond endpoint references from string atom IDs to integer array indices.

index(atomRef) = i where atoms[i].id = atomRef, i [0, n 1]

Where atomRef is the string value of the begin or end attribute on a <bond> element, and n is the total atom count in that molecule. If no match is found, the index defaults to −1, signaling a broken reference.

order = parseInt(orderAttr) with fallback 1 (single bond)

Bond order defaults to 1 when the attribute is absent or non-numeric, matching GChemPaint's implicit single-bond convention. Element symbols default to "C" (carbon) when the Element attribute is omitted, consistent with organic chemistry skeletal notation.

Reference Data

GChemPaint XML ElementAttributeJSON Output KeyTypeDescription
<atom>idatoms[i].idstringInternal atom identifier (e.g., "a1")
<atom>Element / elementatoms[i].elementstringElement symbol: C, N, O, S, etc.
<atom>xatoms[i].xnumber2D x-coordinate in GChemPaint units
<atom>yatoms[i].ynumber2D y-coordinate in GChemPaint units
<atom>Charge / chargeatoms[i].chargeintegerFormal charge (e.g., −1, +1)
<atom>A (mass number)atoms[i].isotopeintegerIsotope mass number if specified
<atom>Hydrogensatoms[i].hydrogensintegerExplicit hydrogen count
<bond>idbonds[j].idstringInternal bond identifier (e.g., "b1")
<bond>orderbonds[j].orderinteger1 = single, 2 = double, 3 = triple
<bond>beginbonds[j].beginintegerZero-based index into atoms array
<bond>endbonds[j].endintegerZero-based index into atoms array
<bond>typebonds[j].typestring"normal", "wedge", "dash", "hash"
<molecule>idmolecules[k].idstringMolecule-level identifier
<molecule> - molecules[k].atomsarrayArray of atom objects in this molecule
<molecule> - molecules[k].bondsarrayArray of bond objects in this molecule
Document root - moleculesarrayTop-level array of all molecules

Frequently Asked Questions

The converter handles the standard GChemPaint document structure: a root element containing one or more elements, each with nested and child elements. It extracts atom attributes (Element, x, y, Charge, A for isotope mass, Hydrogens) and bond attributes (order, begin, end, type). Fragments, reactions, arrows, and text annotations are ignored - only molecular topology data is converted.
Each molecule's atoms are collected into an ordered array. A lookup map is built from atom ID strings (e.g., "a1", "a2") to their zero-based position in that array. When processing bonds, the begin and end attributes are resolved through this map. If an atom ID referenced by a bond does not exist in the molecule's atom list, the index is set to -1 and a warning is included in the conversion output.
Missing Element attributes default to "C" (carbon), following the organic chemistry convention where unspecified vertices represent carbon atoms. Missing Charge defaults to 0. Missing isotope mass number (A attribute) is omitted from the output entirely rather than defaulting to a value. Missing bond order defaults to 1 (single bond). The x and y coordinates default to 0 if absent.
Yes. The output JSON contains a top-level "molecules" array. Each element in the source XML becomes a separate object in this array, with its own independent atoms and bonds arrays. Bond indices are scoped per molecule - they reference positions within that molecule's atom array, not the global document.
The bond type attribute is preserved as a string in the output JSON. Values such as "normal", "wedge" (solid triangle, indicating bond coming out of the plane), "dash" (dashed triangle, going behind the plane), and "hash" (hashed wedge) are mapped directly. If the type attribute is absent, the bond type defaults to "normal". These annotations are informational - no 3D coordinate inference is performed.
The converter accepts XML files up to 5 MB. GChemPaint files are typically under 100 KB even for complex molecules, so this limit is generous. Files exceeding this threshold are rejected before parsing to prevent browser memory issues. For very large combinatorial libraries, consider splitting into individual molecule files.