Learn Japanese 🇯🇵 with Python
Takanori Suzuki
EuroPython 2025 / 2025 Jul 18
PyCon JP 2025
Date: 2025 Sep 26(Fri)-27(Sat)
Place: Hiroshima, Japan
There are English talks

Questions 
Have you learned Japanese? 
Are you interested in Japanese? 
Japanese is difficult 
3 Types of Characters(Hiragana, Katakana, Kanji)
No Spaces between Words
Multiple Readings of Kanji
3 Types of Characters
Emoji |
🐍 |
🍺 |
---|---|---|
Hiragara |
へび |
びーる |
Katakana |
ヘビ |
ビール |
Kanji |
蛇 |
麦酒 |
No Spaces between Words
すもももももももものうち
No Spaces between Words
すもももももももものうち
すもも/も/もも/も/もも/の/うち
Plums and peaches are part of peaches
Multiple Readings of Kanji
日: day, sun
Japanese-style reading: にち(nichi)、ひ(hi)
Chinese-style reading: じつ(jitsu)、か(ka)
Multiple Readings of Kanji
日: day, sun
Japanese-style reading: にち(nichi)、ひ(hi)
Chinese-style reading: じつ(jitsu)、か(ka)
日曜日 (nichi you bi): Sunday
前日 (zen jitsu): previous day
Multiple Readings of Kanji
Same combination but different readings
一日: first day, one day
一日 目: Day 1
一月 一日: Jan 1st
Multiple Readings of Kanji
Same combination but different readings
一日: first day, one day
一日 目 (ichi nichi me): Day 1
一月 一日 (ichi gatsu tsuitachi): Jan 1st
Multiple Readings of Kanji
Special readings of Kanji idioms
今 日: today
昨 日: yesterday
明 日: tomorrow
Multiple Readings of Kanji
Special readings of Kanji idioms
今日 (kyou): today
昨日 (kinou): yesterday
明日 (asu): tomorrow
Learn Japanese with Python
No Spaces between Words
すもももももももものうち
すもも/も/もも/も/もも/の/うち
Japanese morphological analyzer
SudachiPy: pypi.org/project/SudachiPy
SudachiDict: pypi.org/project/SudachiDict-core
$ pip install sudachipy sudachidict_core
Word Segmentation
from sudachipy import Dictionary
tokenizer = Dictionary().create()
text = "すもももももももものうち"
words = [token.surface() for token in tokenizer.tokenize(text)]
print(words)
# -> ['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']
Multiple Readings of Kanji
今 日 は一月一 日 で 日 曜 日
Today is January 1st, Sunday
Morphological Analysis
from sudachipy import Dictionary
tokenizer = Dictionary().create()
text = "今日" # today
tokens = tokenizer.tokenize(text)
print(tokens[0].surface()) # -> 今日
print(tokens[0].reading_form()) # -> キョウ(kyou)
print(tokens[0].part_of_speech()[0]) # -> 名詞(noun)
Get Readings
from sudachipy import Dictionary
tokenizer = Dictionary().create()
text = "今日は一月一日で日曜日"
readings = []
for token in tokenizer.tokenize(text):
readings.append(token.reading_form())
print(readings)
# -> ['キョウ', 'ハ', 'イチ', 'ガツ', 'ツイタチ', 'デ', 'ニチヨウビ']
Can’t read Katakana?
Convert Japanese Character
jaconv: pypi.org/project/jaconv
for token in tokenizer.tokenize(text):
readings.append(token.reading_form())
print(readings)
# -> ['キョウ', 'ハ', 'イチ', 'ガツ', 'ツイタチ', 'デ', 'ニチヨウビ']
print([jaconv.kata2hira(r) for r in readings])
# -> ['きょう', 'は', 'いち', 'がつ', 'ついたち', 'で', 'にちようび']
print([jaconv.kata2alphabet(r) for r in readings])
# -> ['kyou', 'ha', 'ichi', 'gatsu', 'tsuitachi', 'de', 'nichiyoubi']
Want to hear audio? 
Text to Speech
from contextlib import closing
from pathlib import Path
import boto3
polly = boto3.client("polly")
# I want to drink good beer today and tomorrow.
text = "今日も明日もおいしいビールを飲みたい"
result = polly.synthesize_speech(
Text=text, OutputFormat="mp3", VoiceId="Mizuki")
with closing(result["AudioStream"]) as stream:
Path("japanese.mp3").write_bytes(stream.read())
Sample app

Full talk at PyCon US 2025
Thank you 
@takanory takanory takanory takanory