I’m working on a website for intermediate learners to practice by reading and listening, including Japanese, and Japanese is my strongest second language so I can answer a bit about why it’s so uncommon.
Japanese is just really, really hard for computers to deal with. The only reason I got parsing and word segmentation to be pretty good was because I was so familiar with the language and wrote a 3000 line post-processing function on the tokens to get reasonable results. We have a few similar post processing steps like one to better handle separated verbs in German but it’s nothing compared to what we needed for Japanese.
Additionally Japanese kind of breaks our word model, despite being aware of it and planning for it from the start and every part of the app needs special logic to support Japanese properly.
It’s a lovely language, honestly my favorite language I’ve spent time with, but it’s non-trivial to handle it in general with code.
Happy to answer any questions and also, self-promo: https://polyglatte.com for my project. Happy to make improvements to better support you / the intermediate reader use case, just let me know.
Not wanting to sound too rude or harsh, but your tutorial appears to be very non-native Japanese. Grammar mistakes (like ...you+verb instead of ...you ni+verb) plus weird choice of words (tango o suru, seriously what does that mean?)
Thanks for the feedback! I did originally write it myself in a rush using the English as the basis but have since had it rewritten by a native Japanese speaker -- I will have that done again.
> (tango o suru, seriously what does that mean?)
Thanks for pointing this out, there's actually a mismatch between the JSON fixture used to load this article into the DB and the raw text from which that JSON should have been generated: there's a missing token there. Frankly it's just good (or perhaps bad) fortune that it broke in such a way that it kind of made sense without that word there.
I will get that fixed up along with a general rephrasing/look over by a native again.
It would be great if you could align it with the JLPT. Including the listening, which I think is the hardest.
I'm useless at learning languages, basically foreign language dyslexic. I've found targeting the tests & applying for then in advance was the only thing that allowed me to progress.
They're a huge motivator. Not just the looming deadline effect, but also passing makes you feel good, as it's a genuine asset as much as it is proof of progress.
Yeah we have JLPT word lists available and you can set them as a focus and learn the words from the JLPT word lists from clips / snippets of videos and articles -- perhaps there's deeper integration we could do there too.
I agree with JLPT being a useful tool, I personally used their word lists as a core vocab builder when I was studying a lot of Japanese.
> I'm useless at learning languages, basically foreign language dyslexic.
Don't be too hard on yourself! If you were able to pass the JLPT at any level, I think you're doing great.
One of the motivating ideas of Polyglatte is to make language learning more fun. I'm a big believer in mass exposure and quality time spent with the language. I think for most people, most of the time, just being able to have fun with the language will give you great results.
If you're looking for new ways to improve your listening, consider watching some YouTube videos, not with the intention of understanding everything but with the intention of having a good time. For me, listening skill is something that largely came out of nowhere. I gave my brain a bunch of stimulus (Japanese YouTube videos I'd watch for hours every day, even if I didn't really understand it) and then I woke up one day and I just understood most of it or it felt like my brain was suddenly able to keep pace at the very least.
Anyways, the most important part of that system is finding videos you want to watch even if you don't fully understand them and that you'd want to watch even if it wasn't helping you develop a skill.
Japanese is just really, really hard for computers to deal with. The only reason I got parsing and word segmentation to be pretty good was because I was so familiar with the language and wrote a 3000 line post-processing function on the tokens to get reasonable results. We have a few similar post processing steps like one to better handle separated verbs in German but it’s nothing compared to what we needed for Japanese.
Additionally Japanese kind of breaks our word model, despite being aware of it and planning for it from the start and every part of the app needs special logic to support Japanese properly.
It’s a lovely language, honestly my favorite language I’ve spent time with, but it’s non-trivial to handle it in general with code.
Happy to answer any questions and also, self-promo: https://polyglatte.com for my project. Happy to make improvements to better support you / the intermediate reader use case, just let me know.