Learn Japanese 🇯🇵 with Python

Takanori Suzuki

EuroPython 2025 / 2025 Jul 18

PyCon JP 2025

2025.pycon.jp
Date: 2025 Sep 26(Fri)-27(Sat)
Place: Hiroshima, Japan
There are English talks

Questions

Have you learned Japanese?

Are you interested in Japanese?

Japanese is difficult

3 Types of Characters(Hiragana, Katakana, Kanji)
No Spaces between Words
Multiple Readings of Kanji

3 Types of Characters

Emoji	🐍	🍺
Hiragara	へび	びーる
Katakana	ヘビ	ビール
Kanji	蛇	麦酒

No Spaces between Words

すもももももももものうち

No Spaces between Words

すもももももももものうち
すもも/も/もも/も/もも/の/うち
Plums and peaches are part of peaches

Multiple Readings of Kanji

日: day, sun
- Japanese-style reading: にち(nichi)、ひ(hi)
- Chinese-style reading: じつ(jitsu)、か(ka)

Multiple Readings of Kanji

日: day, sun
- Japanese-style reading: にち(nichi)、ひ(hi)
- Chinese-style reading: じつ(jitsu)、か(ka)
日曜日 (nichi you bi): Sunday
前日 (zen jitsu): previous day

Multiple Readings of Kanji

Same combination but different readings
一日: first day, one day
- 一日目: Day 1
- 一月一日: Jan 1st

Multiple Readings of Kanji

Same combination but different readings
一日: first day, one day
- 一日目 (ichi nichi me): Day 1
- 一月一日 (ichi gatsu tsuitachi): Jan 1st

Multiple Readings of Kanji

Special readings of Kanji idioms
今日: today
昨日: yesterday
明日: tomorrow

Multiple Readings of Kanji

Special readings of Kanji idioms
今日 (kyou): today
昨日 (kinou): yesterday
明日 (asu): tomorrow

Learn Japanese with Python

No Spaces between Words

すもももももももものうち
すもも/も/もも/も/もも/の/うち

Japanese morphological analyzer

SudachiPy: pypi.org/project/SudachiPy
SudachiDict: pypi.org/project/SudachiDict-core

$ pip install sudachipy sudachidict_core

Word Segmentation

from sudachipy import Dictionary

tokenizer = Dictionary().create()

text = "すもももももももものうち"
words = [token.surface() for token in tokenizer.tokenize(text)]
print(words)
# -> ['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']

Multiple Readings of Kanji

今日は一月一日で日曜日
Today is January 1st, Sunday

Morphological Analysis

from sudachipy import Dictionary

tokenizer = Dictionary().create()

text = "今日"  # today
tokens = tokenizer.tokenize(text)

print(tokens[0].surface())  # -> 今日
print(tokens[0].reading_form())  # -> キョウ(kyou)
print(tokens[0].part_of_speech()[0])  # -> 名詞(noun)

Get Readings

from sudachipy import Dictionary

tokenizer = Dictionary().create()

text = "今日は一月一日で日曜日"
readings = []
for token in tokenizer.tokenize(text):
    readings.append(token.reading_form())
print(readings)
# -> ['キョウ', 'ハ', 'イチ', 'ガツ', 'ツイタチ', 'デ', 'ニチヨウビ']

Can’t read Katakana?

Convert Japanese Character

jaconv: pypi.org/project/jaconv

for token in tokenizer.tokenize(text):
    readings.append(token.reading_form())

print(readings)
# -> ['キョウ', 'ハ', 'イチ', 'ガツ', 'ツイタチ', 'デ', 'ニチヨウビ']
print([jaconv.kata2hira(r) for r in readings])
# -> ['きょう', 'は', 'いち', 'がつ', 'ついたち', 'で', 'にちようび']
print([jaconv.kata2alphabet(r) for r in readings])
# -> ['kyou', 'ha', 'ichi', 'gatsu', 'tsuitachi', 'de', 'nichiyoubi']

Want to hear audio?

Text to Speech

from contextlib import closing
from pathlib import Path
import boto3

polly = boto3.client("polly")
# I want to drink good beer today and tomorrow.
text = "今日も明日もおいしいビールを飲みたい"

result = polly.synthesize_speech(
    Text=text, OutputFormat="mp3", VoiceId="Mizuki")

with closing(result["AudioStream"]) as stream:
    Path("japanese.mp3").write_bytes(stream.read())

Sample app

github.com/takanory/learn-jp-with-python

Full talk at PyCon US 2025

Thank you

slides.takanory.net code

@takanory takanory takanory takanory

takanory profile kuro-chan and kuri-chan