About

DanMARC2 is the Danish national MARC format used in bibliographic cataloguing. It encodes non-ASCII characters through escape sequences of the form @XXXX, where XXXX is a 4-digit hexadecimal Unicode code point. Failure to correctly decode these sequences results in garbled metadata across library systems, broken search indexes, and corrupted inter-library loan records. This tool parses raw danMARC2 strings according to Annex K of the format specification, converting every @XXXX token to its proper Unicode character via String.fromCodePoint(parseInt(hex, 16)). A literal @@ sequence produces a single @ character. The tool preserves all MARC field indicators, subfield markers (*a, *b), and structural whitespace unchanged.

Limitations: this converter handles the @XXXX hexadecimal escape mechanism defined in Annex K. It does not interpret MARC record leader bytes, directory entries, or field terminators. Composed diacritical sequences from ISO 5426 that do not use the @XXXX pattern require separate handling. Pro tip: always verify output against the original catalogue record when processing batch exports, as some legacy systems emit non-standard 2-digit escapes.

Formulas

The danMARC2 escape mechanism encodes Unicode characters as hexadecimal code points prefixed by @. The conversion function applies the following transformation to each escape token found in the input string:

char = fromCodePoint(parseInt(hex, 16))

where hex is the 4-character hexadecimal string immediately following the @ symbol. The parser uses a finite-state scan with the following rules:

{

@@ → @ (literal at-sign)@XXXX → fromCodePoint(0xXXXX) if XXXX ∈ [0-9a-fA-F]⁴@ → @ (passthrough if fewer than 4 hex digits follow)

where XXXX represents any valid 4-digit hexadecimal value in the range 0000 to FFFF. For code points above FFFF (supplementary planes), the specification does not define a standard escape. All non-escape characters pass through unchanged, preserving MARC field tags, indicators, and subfield delimiters.

Reference Data

DanMARC2 Escape	Hex Code	Unicode Char	Unicode Name	Common Use
@00e9	00E9	é	LATIN SMALL LETTER E WITH ACUTE	French, Spanish loanwords
@00e8	00E8	è	LATIN SMALL LETTER E WITH GRAVE	French, Italian
@00e0	00E0	à	LATIN SMALL LETTER A WITH GRAVE	French, Italian
@00f6	00F6	ö	LATIN SMALL LETTER O WITH DIAERESIS	German, Swedish, Finnish
@00fc	00FC	ü	LATIN SMALL LETTER U WITH DIAERESIS	German, Turkish
@00e6	00E6	æ	LATIN SMALL LETTER AE	Danish, Norwegian
@00f8	00F8	ø	LATIN SMALL LETTER O WITH STROKE	Danish, Norwegian
@00e5	00E5	å	LATIN SMALL LETTER A WITH RING ABOVE	Danish, Norwegian, Swedish
@00c6	00C6	Æ	LATIN CAPITAL LETTER AE	Danish, Norwegian (title case)
@00d8	00D8	Ø	LATIN CAPITAL LETTER O WITH STROKE	Danish, Norwegian (title case)
@00c5	00C5	Å	LATIN CAPITAL LETTER A WITH RING ABOVE	Danish, Norwegian, Swedish (title case)
@00df	00DF	ß	LATIN SMALL LETTER SHARP S	German
@00f1	00F1	ñ	LATIN SMALL LETTER N WITH TILDE	Spanish
@00e7	00E7	ç	LATIN SMALL LETTER C WITH CEDILLA	French, Portuguese, Turkish
@00ee	00EE	î	LATIN SMALL LETTER I WITH CIRCUMFLEX	French, Romanian
@00f4	00F4	ô	LATIN SMALL LETTER O WITH CIRCUMFLEX	French, Portuguese
@00fb	00FB	û	LATIN SMALL LETTER U WITH CIRCUMFLEX	French
@0153	0153	œ	LATIN SMALL LIGATURE OE	French
@0152	0152	Œ	LATIN CAPITAL LIGATURE OE	French (title case)
@00e4	00E4	ä	LATIN SMALL LETTER A WITH DIAERESIS	German, Swedish, Finnish
@00c4	00C4	Ä	LATIN CAPITAL LETTER A WITH DIAERESIS	German, Swedish (title case)
@00d6	00D6	Ö	LATIN CAPITAL LETTER O WITH DIAERESIS	German, Swedish (title case)
@00dc	00DC	Ü	LATIN CAPITAL LETTER U WITH DIAERESIS	German, Turkish (title case)
@017e	017E	ž	LATIN SMALL LETTER Z WITH CARON	Czech, Slovenian, Latvian
@0161	0161	š	LATIN SMALL LETTER S WITH CARON	Czech, Slovenian, Latvian
@010d	010D	č	LATIN SMALL LETTER C WITH CARON	Czech, Slovenian, Croatian
@0159	0159	ř	LATIN SMALL LETTER R WITH CARON	Czech
@0142	0142	ł	LATIN SMALL LETTER L WITH STROKE	Polish
@0144	0144	ń	LATIN SMALL LETTER N WITH ACUTE	Polish
@@	-	@	COMMERCIAL AT (literal)	Escaped @ in data

Frequently Asked Questions

In danMARC2, a literal @ is encoded as @@. The parser detects two consecutive @ characters and emits a single @ in the output. If you see unexpected @ symbols disappearing, check whether the source data properly doubled them.

The converter requires exactly 4 valid hex digits ([0-9a-fA-F]) after the @. If fewer valid hex digits are found, the @ is emitted as a literal character and the following characters pass through unchanged. This prevents data corruption from malformed records.

No. This converter specifically handles the @XXXX hexadecimal Unicode escape mechanism defined in Annex K of the danMARC2 specification. ISO 5426 composed diacriticals (where a combining character precedes the base character) require a separate normalization step, typically NFC normalization via String.normalize("NFC").

This tool processes the string content of danMARC2 fields. It does not parse the MARC record structure (leader, directory, field terminators 0x1E, record separators 0x1D). Extract individual field values first, then convert each through this tool.

Yes. The converter only transforms @XXXX escape sequences. All other characters, including subfield markers (*a, *b), field indicators, and the non-sorting marker ¤, pass through the output unchanged.

Hex digits are case-insensitive. @00E9, @00e9, and @00e9 all produce the same character é (U+00E9). The converter normalizes hex parsing internally via parseInt(hex, 16) which is inherently case-insensitive.