A Japanese Kana Converter With Hepburn, Kunrei, and Nihon-shiki Romanization
Hiragana β Katakana is one Unicode offset (
+0x60). Kana β Romaji is a lookup table, but which table? Japan has three official romanization systems: Hepburn (what most foreigners see), Kunrei-shiki (taught in Japanese schools), and Nihon-shiki (historical, strictest). "shi" vs "si", "tsu" vs "tu", "chi" vs "ti" β they're all correct depending on which system you mean.
Japanese text conversion sounds trivial but opens a surprising set of questions about romanization standards, half-width katakana, and the one-to-many mapping problem of converting back from romaji.
π Live demo: https://sen.ltd/portfolio/kana-converter/
π¦ GitHub: https://github.com/sen-ltd/kana-converter
Features:
- Hiragana β Katakana
- Hiragana / Katakana β Romaji (3 systems)
- Half-width β Full-width katakana
- Live conversion
- Swap direction button
- Japanese / English UI
- Zero dependencies, 73 tests
Hiragana to Katakana: one offset
Hiragana range: U+3041-U+3096. Katakana range: U+30A1-U+30F6. The difference: exactly 0x60.
export function hiraganaToKatakana(text) {
return [...text].map(c => {
const code = c.charCodeAt(0);
if (code >= 0x3041 && code <= 0x3096) {
return String.fromCharCode(code + 0x60);
}
return c;
}).join('');
}
Same conversion in reverse: - 0x60. γ (0x3042) + 0x60 = γ’ (0x30A2). The Unicode consortium aligned the two kana scripts intentionally to make this conversion trivial.
Three romanization systems
Kana-to-romaji needs a lookup table. The three major systems disagree on several characters:
| Kana | Hepburn | Kunrei | Nihon |
|---|---|---|---|
| γ | shi | si | si |
| γ‘ | chi | ti | ti |
| γ€ | tsu | tu | tu |
| γ΅ | fu | hu | hu |
| γ | ji | zi | zi |
| γ’ | ji | zi | di |
| γ₯ | zu | zu | du |
| γγ | sha | sya | sya |
Hepburn is what Japanese train stations use and what most English speakers see. Kunrei-shiki is what Japanese elementary schools teach β more phonetically consistent but less intuitive for English speakers. Nihon-shiki is the strictest, distinguishing homophones like γ and γ’ (both pronounced "ji") by the kana column they come from.
For the converter, each system gets its own lookup table:
const HEPBURN = { 'γ': 'shi', 'γ‘': 'chi', 'γ€': 'tsu', ... };
const KUNREI = { 'γ': 'si', 'γ‘': 'ti', 'γ€': 'tu', ... };
const NIHON = { 'γ': 'si', 'γ‘': 'ti', 'γ€': 'tu', 'γ’': 'di', 'γ₯': 'du', ... };
γ before vowels
A subtle Hepburn rule: γ before a vowel or y is written as "n'" with an apostrophe to prevent ambiguity. ζ‘ε
is "an'nai", not "annai", and εεΏ is "han'nΕ", not "hannΕ".
// Detect γ followed by γγγγγ or γγγ and insert apostrophe
result = result.replace(/n([γγγγγγγγ])/g, "n'$1");
The regex runs on the hiragana source before conversion β that way it sees the actual γ character and can check the following character.
Half-width katakana
Half-width katakana (アイウエ。) lives in a different Unicode block: U+FF66-U+FF9F. They were introduced for 8-bit character sets in the 80s and are still used in some legacy systems (ATMs, older printers).
The quirk: dakuten and handakuten are separate characters in half-width. γ¬ is one char in full-width (U+30AC) but two chars in half-width: ο½Ά (U+FF76) + οΎ (U+FF9E).
const FULL_TO_HALF = {
'γ’': 'ο½±', 'γ¬': 'ο½ΆοΎ', 'γΆ': 'ο½»οΎ', 'γ': 'οΎοΎ', 'γ΄': 'ο½³οΎ', ...
};
So converting γ¬γ¬ (2 characters) produces ο½ΆοΎο½ΆοΎ (4 characters). The string length doubles. The conversion is inherently not length-preserving.
Romaji to Hiragana: greedy matching
Going back from romaji requires greedy longest-match:
const TABLE = [
['shi', 'γ'], ['chi', 'γ‘'], ['tsu', 'γ€'],
['sha', 'γγ'], ['shu', 'γγ
'], ['sho', 'γγ'],
['ka', 'γ'], ['ki', 'γ'], ...
];
export function romajiToHiragana(text) {
let result = '';
let i = 0;
while (i < text.length) {
let matched = false;
// Try longer keys first
for (const [rom, kana] of TABLE) {
if (text.slice(i, i + rom.length) === rom) {
result += kana;
i += rom.length;
matched = true;
break;
}
}
if (!matched) { result += text[i]; i++; }
}
return result;
}
The TABLE is sorted so longer keys come first. This ensures shi matches before s when they overlap. sha matches as a single digraph instead of s + ha.
Series
This is entry #84 in my 100+ public portfolio series.
- π¦ Repo: https://github.com/sen-ltd/kana-converter
- π Live: https://sen.ltd/portfolio/kana-converter/
- π’ Company: https://sen.ltd/





![Defluffer - reduce token usage π by 45% using this one simple trick! [Earthday challenge]](https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekbgepcutl4jse0sfs0.png)







