About

Greek uppercasing is not a trivial toUpperCase call. The Unicode standard and the Greek language impose specific orthographic rules that most software gets wrong. When a lowercase accented vowel like α with tonos (á → U+03AC) is converted to uppercase, the accent must be removed entirely - uppercase Greek letters do not carry a tonos in modern monotonic orthography. This tool implements those rules correctly. It also handles the critical diphthong case: when a sequence like αι with a tonos on the first vowel is uppercased, the second vowel must receive a dialytika (e.g., αΐ → ΑΪ) to preserve the pronunciation distinction. Mozilla Bug #307039 documented this browser-level failure in 2005. Most browsers still produce incorrect results for CSS text-transform: uppercase on Greek text.

This converter processes final sigma (ς → Σ), iota subscript (ypogegrammeni) promotion, and both composed and decomposed Unicode forms. It does not handle polytonic (ancient) Greek with multiple diacritics - that requires a separate normalization pipeline. Input is limited to 100,000 characters. Results match the behavior specified in the Unicode Common Locale Data Repository (CLDR) Greek casing rules.

Formulas

The conversion follows a two-pass algorithm. First pass scans for diphthong sequences; second pass converts remaining characters individually.

Pass 1 - Diphthong scan: For each position i in input string S, check if S[i] + S[i + 1] ∈ D, where D is the set of Greek diphthong pairs (accented vowel + ι/υ). If match found, replace with upper(S[i]) + dialytika(S[i + 1]) and advance i by 2.

Pass 2 - Single character mapping: For each remaining character c, if c ∈ M (accent map), replace with M[c]. Otherwise apply native toUpperCase(c).

Where: S = input string (NFC-normalized), D = diphthong lookup table (14 entries), M = single-character accent map (70+ entries), dialytika(c) = function that adds dialytika to ι or υ (e.g., ι → Ϊ, υ → Ϋ).

Reference Data

Lowercase	Unicode	Correct Uppercase	Unicode	Rule Applied
α (with tonos: ά)	U+03AC	Α	U+0391	Tonos removal
ε (with tonos: έ)	U+03AD	Ε	U+0395	Tonos removal
η (with tonos: ή)	U+03AE	Η	U+0397	Tonos removal
ι (with tonos: ί)	U+03AF	Ι	U+0399	Tonos removal
ο (with tonos: ό)	U+03CC	Ο	U+039F	Tonos removal
υ (with tonos: ύ)	U+03CD	Υ	U+03A5	Tonos removal
ω (with tonos: ώ)	U+03CE	Ω	U+03A9	Tonos removal
άι (diphthong)	U+03AC U+03B9	ΑΪ	U+0391 U+03AA	Tonos removal + dialytika on ι
άυ (diphthong)	U+03AC U+03C5	ΑΫ	U+0391 U+03AB	Tonos removal + dialytika on υ
έι (diphthong)	U+03AD U+03B9	ΕΪ	U+0395 U+03AA	Tonos removal + dialytika on ι
όι (diphthong)	U+03CC U+03B9	ΟΪ	U+039F U+03AA	Tonos removal + dialytika on ι
όυ (diphthong)	U+03CC U+03C5	ΟΫ	U+039F U+03AB	Tonos removal + dialytika on υ
ήυ (diphthong)	U+03AE U+03C5	ΗΫ	U+0397 U+03AB	Tonos removal + dialytika on υ
ύι (diphthong)	U+03CD U+03B9	ΥΪ	U+03A5 U+03AA	Tonos removal + dialytika on ι
ς (final sigma)	U+03C2	Σ	U+03A3	Final sigma → capital sigma
ι (with dialytika: ϊ)	U+03CA	Ϊ	U+03AA	Dialytika preserved
υ (with dialytika: ϋ)	U+03CB	Ϋ	U+03AB	Dialytika preserved
ΐ (ι dialytika + tonos)	U+0390	Ϊ	U+03AA	Tonos removed, dialytika kept
ΰ (υ dialytika + tonos)	U+03B0	Ϋ	U+03AB	Tonos removed, dialytika kept
α (with ypogegrammeni: ᾳ)	U+1FB3	ΑΙ	U+0391 U+0399	Iota subscript promoted
η (with ypogegrammeni: ῃ)	U+1FC3	ΗΙ	U+0397 U+0399	Iota subscript promoted
ω (with ypogegrammeni: ῳ)	U+1FF3	ΩΙ	U+03A9 U+0399	Iota subscript promoted

Frequently Asked Questions

JavaScript's String.prototype.toUpperCase follows the Unicode Default Case Algorithm, which is locale-independent. Greek uppercasing requires locale-specific rules defined in the Unicode CLDR. Specifically, the default algorithm uppercases ά (U+03AC) to Ά (U+0386) - preserving the tonos - when correct modern Greek typography demands the tonos be removed entirely, producing plain Α (U+0391). The toLocaleUpperCase('el') method should handle this, but browser support is inconsistent. This tool implements the rules directly to guarantee correctness.

In Greek, certain vowel combinations (αι, αυ, ει, οι, ου, ηυ, υι) form diphthongs - they are pronounced as a single syllable. When a tonos (accent) sits on the first vowel of such a pair and the text is uppercased, the accent is removed. However, to signal that the two vowels are still part of the same phonetic unit (and not two separate syllables), a dialytika (diaeresis, ¨) is placed on the second vowel. Example: άι → ΑΪ, not ΑΙ. Without the dialytika, a reader might misinterpret the vowel combination.

This tool is designed for modern monotonic Greek, which uses only the tonos (acute accent) and dialytika. Polytonic Greek includes additional diacritics: spiritus asper (rough breathing ἁ), spiritus lenis (smooth breathing ἀ), circumflex/perispomeni (ᾶ), and iota subscript (ᾳ). The tool does handle iota subscript (ypogegrammeni) by promoting it to a full iota on uppercase (e.g., ᾳ → ΑΙ). For full polytonic support with all breathing marks, a more comprehensive normalization pipeline is required.

Greek has two forms of lowercase sigma: medial σ (U+03C3, used within words) and final ς (U+03C2, used at word end). Both map to the single uppercase Σ (U+03A3). This tool correctly maps both forms. Note that when converting back to lowercase, context-dependent logic would be needed to restore the correct sigma form - but this tool only performs the uppercase direction.

Unicode allows the same visual character to be encoded in multiple ways. For example, ά can be a single composed codepoint (U+03AC, NFC form) or a base α (U+03B1) followed by a combining acute accent (U+0301, NFD form). The tool calls String.prototype.normalize("NFC") to collapse decomposed sequences into their composed equivalents before applying the lookup table. Without this step, decomposed characters would slip through the mapping unmodified, producing incorrect mixed-case output.

Non-Greek characters (Latin, Cyrillic, CJK, punctuation, numbers, emoji) pass through unmodified. The tool's lookup map only contains Greek codepoints. For non-Greek alphabetic characters, the native toUpperCase is applied as a fallback, which is correct for Latin and most other scripts. This means you can safely paste mixed-language text and only the Greek portions will receive the specialized uppercasing rules.