Learn Japanese ๐ฏ๐ต with Python
Takanori Suzuki
PyCon US 2024 / 2024 May 17
PyCon JP 2024 CfP is Open
Proposal Deadline: May 31 (English is welcome!!)
Date: Sep 27-29
Place: Tokyo, Japan
Questions ๐
Have you learned Japanese? ๐โโ๏ธ
Are you interested in Japanese? ๐โโ๏ธ
Japanese is difficult ๐ค
3 Types of Characters(Hiragana, Katakana, Kanji)
No Spaces between Words
Multiple Readings of Kanji
3 Types of Characters
Emoji |
๐ |
๐บ |
---|---|---|
Hiragara |
ใธใณ |
ใณใผใ |
Katakana |
ใใ |
ใใผใซ |
Kanji |
่ |
้บฆ้ |
No Spaces between Words
ใใใใใใใใใใฎใใก
No Spaces between Words
ใใใใใใใใใใฎใใก
ใใใ/ใ/ใใ/ใ/ใใ/ใฎ/ใใก
Plums and peaches are part of peaches
Multiple Readings of Kanji
ๆฅ: day, sun
Japanese-style reading: ใซใก(nichi)ใใฒ(hi)
Chinese-style reading: ใใค(jitsu)ใใ(ka)
Multiple Readings of Kanji
ๆฅ: day, sun
Japanese-style reading: ใซใก(nichi)ใใฒ(hi)
Chinese-style reading: ใใค(jitsu)ใใ(ka)
ๆฅๆๆฅ (nichi you bi): Sunday
ๅๆฅ (zen jitsu): previous day
๐จ
Multiple Readings of Kanji
Same combination but different readings
ไธๆฅ: first day, one day
ไธๆฅ ็ฎ: Day 1
ไธๆ ไธๆฅ: Jan 1st
Multiple Readings of Kanji
Same combination but different readings
ไธๆฅ: first day, one day
ไธๆฅ ็ฎ (ichi nichi me): Day 1
ไธๆ ไธๆฅ (ichi gatsu tsuitachi): Jan 1st
๐ฑ ๐ฑ
Multiple Readings of Kanji
Special readings of Kanji idioms
ไป ๆฅ: today
ๆจ ๆฅ: yesterday
ๆ ๆฅ: tomorrow
Multiple Readings of Kanji
Special readings of Kanji idioms
ไปๆฅ (kyou): today
ๆจๆฅ (kinou): yesterday
ๆๆฅ (asu): tomorrow
๐คฏ ๐คฏ ๐คฏ
Learn Japanese with Python
No Spaces between Words
ใใใใใใใใใใฎใใก
ใใใ/ใ/ใใ/ใ/ใใ/ใฎ/ใใก
Japanese morphological analyzer
SudachiPy: pypi.org/project/SudachiPy
SudachiDict: pypi.org/project/SudachiDict-core
$ pip install sudachipy sudachidict_core
Word Segmentation
from sudachipy import Dictionary
tokenizer = Dictionary().create()
text = "ใใใใใใใใใใฎใใก"
words = [token.surface() for token in tokenizer.tokenize(text)]
print(words)
# -> ['ใใใ', 'ใ', 'ใใ', 'ใ', 'ใใ', 'ใฎ', 'ใใก']
Multiple Readings of Kanji
ไป ๆฅ ใฏไธๆไธ ๆฅ ใง ๆฅ ๆ ๆฅ
Today is January 1st, Sunday
Morphological Analysis
from sudachipy import Dictionary
tokenizer = Dictionary().create()
text = "ไปๆฅ" # today
tokens = tokenizer.tokenize(text)
print(tokens[0].surface()) # -> ไปๆฅ
print(tokens[0].reading_form()) # -> ใญใงใฆ(kyou)
print(tokens[0].part_of_speech()[0]) # -> ๅ่ฉ(noun)
Get Readings
from sudachipy import Dictionary
tokenizer = Dictionary().create()
text = "ไปๆฅใฏไธๆไธๆฅใงๆฅๆๆฅ"
readings = []
for token in tokenizer.tokenize(text):
readings.append(token.reading_form())
print(readings)
# -> ['ใญใงใฆ', 'ใ', 'ใคใ', 'ใฌใ', 'ใใคใฟใ', 'ใ', 'ใใใจใฆใ']
Canโt read Katakana?
Convert Japanese Character
jaconv: pypi.org/project/jaconv
for token in tokenizer.tokenize(text):
readings.append(token.reading_form())
print(readings)
# -> ['ใญใงใฆ', 'ใ', 'ใคใ', 'ใฌใ', 'ใใคใฟใ', 'ใ', 'ใใใจใฆใ']
print([jaconv.kata2hira(r) for r in readings])
# -> ['ใใใ', 'ใฏ', 'ใใก', 'ใใค', 'ใคใใใก', 'ใง', 'ใซใกใใใณ']
print([jaconv.kata2alphabet(r) for r in readings])
# -> ['kyou', 'ha', 'ichi', 'gatsu', 'tsuitachi', 'de', 'nichiyoubi']
Want to hear audio? ๐ฃ๏ธ
Text to Speech
from contextlib import closing
from pathlib import Path
import boto3
polly = boto3.client("polly")
# I want to drink good beer today and tomorrow.
text = "ไปๆฅใๆๆฅใใใใใใใผใซใ้ฃฒใฟใใ"
result = polly.synthesize_speech(
Text=text, OutputFormat="mp3", VoiceId="Mizuki")
with closing(result["AudioStream"]) as stream:
Path("japanese.mp3").write_bytes(stream.read())
Sample app
Thank you ๐
@takanory takanory takanory takanory