DanMARC2 to Unicode Converter
Convert danMARC2 encoded strings to Unicode characters. Parses @XXXX escape sequences per Annex K of the danMARC2 format specification.
About
DanMARC2 is the Danish national MARC format used in bibliographic cataloguing. It encodes non-ASCII characters through escape sequences of the form @XXXX, where XXXX is a 4-digit hexadecimal Unicode code point. Failure to correctly decode these sequences results in garbled metadata across library systems, broken search indexes, and corrupted inter-library loan records. This tool parses raw danMARC2 strings according to Annex K of the format specification, converting every @XXXX token to its proper Unicode character via String.fromCodePoint(parseInt(hex, 16)). A literal @@ sequence produces a single @ character. The tool preserves all MARC field indicators, subfield markers (*a, *b), and structural whitespace unchanged.
Limitations: this converter handles the @XXXX hexadecimal escape mechanism defined in Annex K. It does not interpret MARC record leader bytes, directory entries, or field terminators. Composed diacritical sequences from ISO 5426 that do not use the @XXXX pattern require separate handling. Pro tip: always verify output against the original catalogue record when processing batch exports, as some legacy systems emit non-standard 2-digit escapes.
Formulas
The danMARC2 escape mechanism encodes Unicode characters as hexadecimal code points prefixed by @. The conversion function applies the following transformation to each escape token found in the input string:
where hex is the 4-character hexadecimal string immediately following the @ symbol. The parser uses a finite-state scan with the following rules:
where XXXX represents any valid 4-digit hexadecimal value in the range 0000 to FFFF. For code points above FFFF (supplementary planes), the specification does not define a standard escape. All non-escape characters pass through unchanged, preserving MARC field tags, indicators, and subfield delimiters.
Reference Data
| DanMARC2 Escape | Hex Code | Unicode Char | Unicode Name | Common Use |
|---|---|---|---|---|
| @00e9 | 00E9 | é | LATIN SMALL LETTER E WITH ACUTE | French, Spanish loanwords |
| @00e8 | 00E8 | è | LATIN SMALL LETTER E WITH GRAVE | French, Italian |
| @00e0 | 00E0 | à | LATIN SMALL LETTER A WITH GRAVE | French, Italian |
| @00f6 | 00F6 | ö | LATIN SMALL LETTER O WITH DIAERESIS | German, Swedish, Finnish |
| @00fc | 00FC | ü | LATIN SMALL LETTER U WITH DIAERESIS | German, Turkish |
| @00e6 | 00E6 | æ | LATIN SMALL LETTER AE | Danish, Norwegian |
| @00f8 | 00F8 | ø | LATIN SMALL LETTER O WITH STROKE | Danish, Norwegian |
| @00e5 | 00E5 | å | LATIN SMALL LETTER A WITH RING ABOVE | Danish, Norwegian, Swedish |
| @00c6 | 00C6 | Æ | LATIN CAPITAL LETTER AE | Danish, Norwegian (title case) |
| @00d8 | 00D8 | Ø | LATIN CAPITAL LETTER O WITH STROKE | Danish, Norwegian (title case) |
| @00c5 | 00C5 | Å | LATIN CAPITAL LETTER A WITH RING ABOVE | Danish, Norwegian, Swedish (title case) |
| @00df | 00DF | ß | LATIN SMALL LETTER SHARP S | German |
| @00f1 | 00F1 | ñ | LATIN SMALL LETTER N WITH TILDE | Spanish |
| @00e7 | 00E7 | ç | LATIN SMALL LETTER C WITH CEDILLA | French, Portuguese, Turkish |
| @00ee | 00EE | î | LATIN SMALL LETTER I WITH CIRCUMFLEX | French, Romanian |
| @00f4 | 00F4 | ô | LATIN SMALL LETTER O WITH CIRCUMFLEX | French, Portuguese |
| @00fb | 00FB | û | LATIN SMALL LETTER U WITH CIRCUMFLEX | French |
| @0153 | 0153 | œ | LATIN SMALL LIGATURE OE | French |
| @0152 | 0152 | Œ | LATIN CAPITAL LIGATURE OE | French (title case) |
| @00e4 | 00E4 | ä | LATIN SMALL LETTER A WITH DIAERESIS | German, Swedish, Finnish |
| @00c4 | 00C4 | Ä | LATIN CAPITAL LETTER A WITH DIAERESIS | German, Swedish (title case) |
| @00d6 | 00D6 | Ö | LATIN CAPITAL LETTER O WITH DIAERESIS | German, Swedish (title case) |
| @00dc | 00DC | Ü | LATIN CAPITAL LETTER U WITH DIAERESIS | German, Turkish (title case) |
| @017e | 017E | ž | LATIN SMALL LETTER Z WITH CARON | Czech, Slovenian, Latvian |
| @0161 | 0161 | š | LATIN SMALL LETTER S WITH CARON | Czech, Slovenian, Latvian |
| @010d | 010D | č | LATIN SMALL LETTER C WITH CARON | Czech, Slovenian, Croatian |
| @0159 | 0159 | ř | LATIN SMALL LETTER R WITH CARON | Czech |
| @0142 | 0142 | ł | LATIN SMALL LETTER L WITH STROKE | Polish |
| @0144 | 0144 | ń | LATIN SMALL LETTER N WITH ACUTE | Polish |
| @@ | - | @ | COMMERCIAL AT (literal) | Escaped @ in data |