User Rating 0.0
Total Usage 0 times
Quick presets:
Result
Token breakdown:
Is this tool helpful?

Your feedback helps us improve.

About

Romanized Japanese (romaji) introduces ambiguity that compounds with text length. A single syllable like shi maps unambiguously to or , but ki maps to over 40 distinct kanji including 気, 木, 機, and 記 - each with radically different meaning. Misselecting a kanji in formal writing or signage is not a typo; it is a semantic error that can change contracts, addresses, or medical instructions. This converter implements a greedy left-to-right tokenizer against the complete Modified Hepburn romanization table (107 mora entries) covering gojūon, dakuten (゛), handakuten (゜), yōon combinations, and sokuon (っ) doubling rules. The kanji dictionary returns all common readings per mora so you can cross-reference, not guess.

Limitations: kanji selection is reading-based, not context-based. Natural language disambiguation (e.g., distinguishing 橋 from 箸 for hashi) requires sentence-level NLP beyond this tool's scope. For particle-aware conversion or compound-word kanji, consult a full IME. This tool approximates dictionary lookup assuming isolated mora input.

romaji converter hiragana katakana kanji japanese kana converter romanization gojuon japanese writing

Formulas

The converter uses a greedy left-to-right tokenization algorithm over the input romaji string. At each position i, the algorithm attempts to match the longest possible romaji token.

tokenize(s) : i = 0, scan s[i..i+3], then s[i..i+2], then s[i] against lookup table T

Sokuon detection: if s[i] = s[i+1] and both are consonants n, emit っ (hiragana) or ッ (katakana) and advance i by 1. Syllabic n detection: emit ん/ン when n appears before a consonant (not a, i, u, e, o, y) or at string end.

Where s = input romaji string, i = current scan position, T = mora lookup dictionary containing 107 entries mapping Modified Hepburn romaji to Unicode kana codepoints. Kanji mode queries a secondary dictionary K keyed by romaji reading, returning an array of all kanji sharing that on'yomi or kun'yomi reading.

Reference Data

RomajiHiraganaKatakanaTypeNotes
aGojūonVowel
kaGojūonK-row
shiGojūonHepburn: し, not si
chiGojūonHepburn: ち, not ti
tsuGojūonHepburn: つ, not tu
fuGojūonHepburn: ふ, not hu
nGojūonSyllabic nasal; standalone before consonant
gaDakutenVoiced K-row
zaDakutenVoiced S-row
daDakutenVoiced T-row
baDakutenVoiced H-row
paHandakutenSemi-voiced H-row
kyaきゃキャYōonK-row combo
shaしゃシャYōonS-row combo (Hepburn)
chaちゃチャYōonT-row combo (Hepburn)
nyaにゃニャYōonN-row combo
hyaひゃヒャYōonH-row combo
myaみゃミャYōonM-row combo
ryaりゃリャYōonR-row combo
gyaぎゃギャYōonVoiced K-row combo
jaじゃジャYōonVoiced S-row combo
byaびゃビャYōonVoiced H-row combo
pyaぴゃピャYōonSemi-voiced H-row combo
kk*っk*ッk*SokuonDouble consonant → っ/ッ prefix
ss*っs*ッs*SokuonDouble consonant → っ/ッ prefix
tt*っt*ッt*SokuonDouble consonant → っ/ッ prefix
pp*っp*ッp*SokuonDouble consonant → っ/ッ prefix
woGojūonParticle を
wiArchaicHistorical kana
weArchaicHistorical kana
diDakutenVoiced T-row alternate
duDakutenVoiced T-row alternate

Frequently Asked Questions

The tokenizer uses lookahead. When it encounters "n", it checks the next character. If the next character is a vowel (a, i, u, e, o) or "y", it treats "n" as the start of a multi-character mora (e.g., "na" → な). If the next character is a consonant, a space, or the string ends, it emits ん (or ン). To force syllabic n before a vowel, use "n'" with an apostrophe - standard Modified Hepburn notation (e.g., "shin'ichi" → しんいち, not しにち).
Japanese kanji are logographic. Multiple kanji can share identical readings. The mora "ki" maps to 木 (tree), 気 (spirit), 機 (machine), 記 (record), and over 35 others. This tool returns all common kanji for that reading. Selecting the correct kanji requires sentence context (semantic disambiguation), which is the domain of full Input Method Editors (IME), not a reading-based lookup tool.
Yes. The dictionary accepts both systems. Hepburn "shi" and Kunrei-shiki "si" both map to し/シ. Similarly, 'chi'/'ti' → ち, 'tsu'/'tu' → つ, 'fu'/'hu' → ふ, 'ja'/'zya' → じゃ. Hepburn forms are prioritized in the reference table because they are the ISO 3602 Strict standard and more widely used internationally.
When the tokenizer detects two identical consecutive consonants (excluding 'n'), it emits the sokuon character っ (hiragana) or ッ (katakana) for the first consonant, then processes the second consonant normally as part of the next mora. So "gakkou" becomes が・っ・こ・う (gakkō). The sokuon represents a geminate consonant - a brief pause or glottal stop before the following consonant sound.
Non-alphabetic characters pass through unchanged. Numbers, spaces, punctuation marks, and existing Japanese characters (hiragana, katakana, kanji) are preserved in their original position. Only sequences of Latin letters a-z are tokenized and converted. This allows mixed-script input like "Tokyo 2024" to produce "とうきょう 2024" without data loss.
The converter handles long vowels through standard romaji doubling: "ou" → おう, "uu" → うう, "oo" → おお. Macron characters (ō, ū, ā) are not natively supported because they are display conventions, not input standards. If you need おう, type "ou". For katakana long vowels using the chōon mark ー, type a double vowel (e.g., "raamen" → ラーメン uses explicit mapping for common patterns).