Learn Japanese 🇯🇵 with Python

Takanori Suzuki

../_images/ep2025-logo-and-caption.svg

EuroPython 2025 / 2025 Jul 18

PyCon JP 2025

  • 2025.pycon.jp

  • Date: 2025 Sep 26(Fri)-27(Sat)

  • Place: Hiroshima, Japan

  • There are English talks

PyCon JP 2025 in Hiroshima

Questions hai

Have you learned Japanese? study

Are you interested in Japanese? miru

Japanese is difficult yabai

  • 3 Types of Characters(Hiragana, Katakana, Kanji)

  • No Spaces between Words

  • Multiple Readings of Kanji

3 Types of Characters

Emoji

🐍

🍺

Hiragara

へび

びーる

Katakana

ヘビ

ビール

Kanji

麦酒

No Spaces between Words

  • すもももももももものうち

No Spaces between Words

  • すもももももももものうち

  • すもも/も/もも/も/もも/の/うち

  • Plums and peaches are part of peaches

Multiple Readings of Kanji

  • : day, sun

    • Japanese-style reading: にち(nichi)、ひ(hi)

    • Chinese-style reading: じつ(jitsu)、か(ka)

Multiple Readings of Kanji

  • : day, sun

    • Japanese-style reading: にち(nichi)、ひ(hi)

    • Chinese-style reading: じつ(jitsu)、か(ka)

  • 日曜日 (nichi you bi): Sunday

  • 前日 (zen jitsu): previous day

pokan

Multiple Readings of Kanji

  • Same combination but different readings

  • 一日: first day, one day

    • 一日 目: Day 1

    • 一月 一日: Jan 1st

Multiple Readings of Kanji

  • Same combination but different readings

  • 一日: first day, one day

    • 一日 目 (ichi nichi me): Day 1

    • 一月 一日 (ichi gatsu tsuitachi): Jan 1st

yabai yabai

Multiple Readings of Kanji

  • Special readings of Kanji idioms

  • : today

  • : yesterday

  • : tomorrow

Multiple Readings of Kanji

  • Special readings of Kanji idioms

  • 今日 (kyou): today

  • 昨日 (kinou): yesterday

  • 明日 (asu): tomorrow

scream scream scream

Learn Japanese with Python

No Spaces between Words

  • すもももももももものうち

  • すもも/も/もも/も/もも/の/うち

Japanese morphological analyzer

$ pip install sudachipy sudachidict_core

Word Segmentation

from sudachipy import Dictionary

tokenizer = Dictionary().create()

text = "すもももももももものうち"
words = [token.surface() for token in tokenizer.tokenize(text)]
print(words)
# -> ['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']

Multiple Readings of Kanji

  • は一月一

  • Today is January 1st, Sunday

Morphological Analysis

from sudachipy import Dictionary

tokenizer = Dictionary().create()

text = "今日"  # today
tokens = tokenizer.tokenize(text)

print(tokens[0].surface())  # -> 今日
print(tokens[0].reading_form())  # -> キョウ(kyou)
print(tokens[0].part_of_speech()[0])  # -> 名詞(noun)

Get Readings

from sudachipy import Dictionary

tokenizer = Dictionary().create()

text = "今日は一月一日で日曜日"
readings = []
for token in tokenizer.tokenize(text):
    readings.append(token.reading_form())
print(readings)
# -> ['キョウ', 'ハ', 'イチ', 'ガツ', 'ツイタチ', 'デ', 'ニチヨウビ']

Can’t read Katakana?

Convert Japanese Character

for token in tokenizer.tokenize(text):
    readings.append(token.reading_form())

print(readings)
# -> ['キョウ', 'ハ', 'イチ', 'ガツ', 'ツイタチ', 'デ', 'ニチヨウビ']
print([jaconv.kata2hira(r) for r in readings])
# -> ['きょう', 'は', 'いち', 'がつ', 'ついたち', 'で', 'にちようび']
print([jaconv.kata2alphabet(r) for r in readings])
# -> ['kyou', 'ha', 'ichi', 'gatsu', 'tsuitachi', 'de', 'nichiyoubi']

Want to hear audio? kiku

Text to Speech

from contextlib import closing
from pathlib import Path
import boto3

polly = boto3.client("polly")
# I want to drink good beer today and tomorrow.
text = "今日も明日もおいしいビールを飲みたい"

result = polly.synthesize_speech(
    Text=text, OutputFormat="mp3", VoiceId="Mizuki")

with closing(result["AudioStream"]) as stream:
    Path("japanese.mp3").write_bytes(stream.read())

Sample app

../_images/sample-app.png

Full talk at PyCon US 2025

Thank you pray

slides.takanory.net code

@takanory takanory takanory takanory

takanory profile kuro-chan and kuri-chan