About

Structural mismatch between tabular CSV and hierarchical XML is a persistent data-engineering problem. A naive row-to-element conversion produces flat, unusable XML that fails schema validation in systems expecting nested structures, attributes, and grouped records. This converter implements a column-to-XPath mapping engine: each CSV header maps to a target path such as root/parent/child/text() or root/element/@attr. The parser follows RFC 4180 for CSV, handling quoted fields, embedded delimiters, and escaped double-quotes. Optional primary-key grouping collapses N rows sharing the same key into a single parent element with repeated child nodes. The tool approximates the behavior of XSLT-based pipelines but runs entirely in the browser with zero server round-trips.

Limitations: XML namespace declarations are not auto-generated. If your target schema requires xmlns prefixes, add them manually to the root element name field. Very large files (above 5 MB) may cause browser memory pressure. For production ETL pipelines processing millions of rows, a server-side streaming solution remains appropriate.

Formulas

The conversion pipeline operates in three discrete stages. First, the CSV parser tokenizes input using a finite-state machine with states S ∈ {FIELD_START, UNQUOTED, QUOTED, QUOTE_IN_QUOTED}. Transitions depend on the current character c and configured delimiter d.

{

S → QUOTED if c = " and S = FIELD_STARTS → QUOTE_IN_QUOTED if c = " and S = QUOTEDemit field, advance row if c = LF and S ≠ QUOTEDemit field if c = d and S ≠ QUOTED

Second, the mapping engine processes each row. For a mapping target path P = p₁/p₂/.../p_n, the algorithm walks from the row's root element, creating intermediate elements as needed. The terminal segment p_n determines the action:

{

setTextContent(parent, value) if p_n = text()setAttribute(parent, name, value) if p_n = @namecreateChild(parent, p_n) otherwise (element with text)

Third, when primary-key grouping is enabled, rows sharing key k are merged. The total output elements E relates to input rows R and unique keys K:

E = |K| ≤ |R|

Where K = {r_i[primaryKey] | r_i ∈ R}. Each unique key produces one parent element. Repeated-key rows append child elements within that parent.

Reference Data

XPath-like Token	Meaning	Example Path	Resulting XML
element	Creates/selects child element	book/title	<book><title>...</title></book>
text()	Sets text content of parent element	book/title/text()	<title>Value</title>
@attr	Sets attribute on parent element	book/@isbn	<book isbn="Value">
parent/child	Nested elements via slash separator	a/b/c/text()	<a><b><c>Val</c></b></a>
@code on intermediate	Attribute on any nesting level	item/type/@code	<item><type code="Val"/></item>
CSV Delimiter Reference
Comma	,	Default RFC 4180	Most common format
Semicolon	;	European locale CSVs	Used when decimal is comma
Tab	\t	TSV files	Database exports
Pipe	\|	Legacy mainframe	Fixed-width alternatives
XML Special Character Escaping
&	&	Ampersand	Always escaped in text/attrs
<	<	Less than	Escaped in text content
>	>	Greater than	Escaped in text content
"	"	Double quote	Escaped in attribute values
'	'	Apostrophe	Escaped in attribute values
Common XML Encoding Declarations
UTF-8	<?xml version="1.0" encoding="UTF-8"?>	Default	Most web systems
UTF-16	encoding="UTF-16"	Windows legacy	BOM required
ISO-8859-1	encoding="ISO-8859-1"	Latin-1	Legacy European

Frequently Asked Questions

The parser implements RFC 4180 quoting rules. Any field wrapped in double quotes can contain the delimiter character, newline characters (CRLF or LF), and literal double quotes (escaped as two consecutive double quotes ""). The finite-state machine transitions to a QUOTED state upon encountering an opening quote at field start, and only exits when it finds a closing quote followed by a delimiter or end-of-line. This means a field value like "Smith, John" is correctly parsed as a single value Smith, John rather than being split into two columns.

Attributes can be set at any depth in the path. The @ prefix is only valid as the final segment. For example, root/item/category/@code creates elements root and item, then creates element category and sets attribute code on it. If another mapping targets root/item/category/text(), both the attribute and text content are applied to the same category element, producing output like <category code="X">Y</category>.

When you specify a primary key column, all consecutive rows sharing the same key value are merged into a single parent XML element. Mapped fields that differ across grouped rows produce repeated child elements. For instance, if three rows share key P001 and each has a different transaction_value, the output contains one parent element with three transaction/value child elements. Important: the CSV must be sorted by the primary key column for correct grouping. Unsorted data will produce duplicate parent elements for the same key.

No. The converter produces well-formed XML but does not perform schema validation. The output is parsed through the browser's native DOMParser to verify well-formedness (no unclosed tags, no invalid characters). If your target system requires XSD compliance, validate the downloaded file using an external XSD validator. Common issues include missing required elements (unmapped CSV columns), incorrect nesting depth, and namespace declarations that must be added manually to the root element name.

The practical limit depends on available browser memory. Files up to approximately 5 MB (roughly 50,000 to 100,000 rows) convert reliably in modern browsers. The converter processes data in the main thread using chunked iteration to prevent UI freezing. For files exceeding 5 MB, a warning is displayed. At 10 MB and above, you risk out-of-memory errors or tab crashes, particularly on mobile devices with constrained RAM.

Yes. Enable the "Auto-map unmapped columns" option. Any CSV column header not explicitly listed in the mapping table is converted to a direct child element of the row element, using the column header as the element name (sanitized to remove spaces and special characters). The column value becomes the text content. This is useful for quick conversions where you only need custom paths for a few columns and want the rest included with default flat structure.