User Rating 0.0
Total Usage 0 times
Enter a danMARC2 encoded string containing @XXXX escape sequences.
Is this tool helpful?

Your feedback helps us improve.

About

DanMARC2 is the Danish national MARC format used in bibliographic cataloguing. It encodes non-ASCII characters through escape sequences of the form @XXXX, where XXXX is a 4-digit hexadecimal Unicode code point. Failure to correctly decode these sequences results in garbled metadata across library systems, broken search indexes, and corrupted inter-library loan records. This tool parses raw danMARC2 strings according to Annex K of the format specification, converting every @XXXX token to its proper Unicode character via String.fromCodePoint(parseInt(hex, 16)). A literal @@ sequence produces a single @ character. The tool preserves all MARC field indicators, subfield markers (*a, *b), and structural whitespace unchanged.

Limitations: this converter handles the @XXXX hexadecimal escape mechanism defined in Annex K. It does not interpret MARC record leader bytes, directory entries, or field terminators. Composed diacritical sequences from ISO 5426 that do not use the @XXXX pattern require separate handling. Pro tip: always verify output against the original catalogue record when processing batch exports, as some legacy systems emit non-standard 2-digit escapes.

danmarc2 unicode marc library danish character-encoding converter cataloguing

Formulas

The danMARC2 escape mechanism encodes Unicode characters as hexadecimal code points prefixed by @. The conversion function applies the following transformation to each escape token found in the input string:

char = fromCodePoint(parseInt(hex, 16))

where hex is the 4-character hexadecimal string immediately following the @ symbol. The parser uses a finite-state scan with the following rules:

{
@@ @ (literal at-sign)@XXXX fromCodePoint(0xXXXX) if XXXX [0-9a-fA-F]4@ @ (passthrough if fewer than 4 hex digits follow)

where XXXX represents any valid 4-digit hexadecimal value in the range 0000 to FFFF. For code points above FFFF (supplementary planes), the specification does not define a standard escape. All non-escape characters pass through unchanged, preserving MARC field tags, indicators, and subfield delimiters.

Reference Data

DanMARC2 EscapeHex CodeUnicode CharUnicode NameCommon Use
@00e900E9éLATIN SMALL LETTER E WITH ACUTEFrench, Spanish loanwords
@00e800E8èLATIN SMALL LETTER E WITH GRAVEFrench, Italian
@00e000E0àLATIN SMALL LETTER A WITH GRAVEFrench, Italian
@00f600F6öLATIN SMALL LETTER O WITH DIAERESISGerman, Swedish, Finnish
@00fc00FCüLATIN SMALL LETTER U WITH DIAERESISGerman, Turkish
@00e600E6æLATIN SMALL LETTER AEDanish, Norwegian
@00f800F8øLATIN SMALL LETTER O WITH STROKEDanish, Norwegian
@00e500E5åLATIN SMALL LETTER A WITH RING ABOVEDanish, Norwegian, Swedish
@00c600C6ÆLATIN CAPITAL LETTER AEDanish, Norwegian (title case)
@00d800D8ØLATIN CAPITAL LETTER O WITH STROKEDanish, Norwegian (title case)
@00c500C5ÅLATIN CAPITAL LETTER A WITH RING ABOVEDanish, Norwegian, Swedish (title case)
@00df00DFßLATIN SMALL LETTER SHARP SGerman
@00f100F1ñLATIN SMALL LETTER N WITH TILDESpanish
@00e700E7çLATIN SMALL LETTER C WITH CEDILLAFrench, Portuguese, Turkish
@00ee00EEîLATIN SMALL LETTER I WITH CIRCUMFLEXFrench, Romanian
@00f400F4ôLATIN SMALL LETTER O WITH CIRCUMFLEXFrench, Portuguese
@00fb00FBûLATIN SMALL LETTER U WITH CIRCUMFLEXFrench
@01530153œLATIN SMALL LIGATURE OEFrench
@01520152ŒLATIN CAPITAL LIGATURE OEFrench (title case)
@00e400E4äLATIN SMALL LETTER A WITH DIAERESISGerman, Swedish, Finnish
@00c400C4ÄLATIN CAPITAL LETTER A WITH DIAERESISGerman, Swedish (title case)
@00d600D6ÖLATIN CAPITAL LETTER O WITH DIAERESISGerman, Swedish (title case)
@00dc00DCÜLATIN CAPITAL LETTER U WITH DIAERESISGerman, Turkish (title case)
@017e017EžLATIN SMALL LETTER Z WITH CARONCzech, Slovenian, Latvian
@01610161šLATIN SMALL LETTER S WITH CARONCzech, Slovenian, Latvian
@010d010DčLATIN SMALL LETTER C WITH CARONCzech, Slovenian, Croatian
@01590159řLATIN SMALL LETTER R WITH CARONCzech
@01420142łLATIN SMALL LETTER L WITH STROKEPolish
@01440144ńLATIN SMALL LETTER N WITH ACUTEPolish
@@ - @COMMERCIAL AT (literal)Escaped @ in data

Frequently Asked Questions

In danMARC2, a literal @ is encoded as @@. The parser detects two consecutive @ characters and emits a single @ in the output. If you see unexpected @ symbols disappearing, check whether the source data properly doubled them.
The converter requires exactly 4 valid hex digits ([0-9a-fA-F]) after the @. If fewer valid hex digits are found, the @ is emitted as a literal character and the following characters pass through unchanged. This prevents data corruption from malformed records.
No. This converter specifically handles the @XXXX hexadecimal Unicode escape mechanism defined in Annex K of the danMARC2 specification. ISO 5426 composed diacriticals (where a combining character precedes the base character) require a separate normalization step, typically NFC normalization via String.normalize("NFC").
This tool processes the string content of danMARC2 fields. It does not parse the MARC record structure (leader, directory, field terminators 0x1E, record separators 0x1D). Extract individual field values first, then convert each through this tool.
Yes. The converter only transforms @XXXX escape sequences. All other characters, including subfield markers (*a, *b), field indicators, and the non-sorting marker ¤, pass through the output unchanged.
Hex digits are case-insensitive. @00E9, @00e9, and @00e9 all produce the same character é (U+00E9). The converter normalizes hex parsing internally via parseInt(hex, 16) which is inherently case-insensitive.