Well yeah, English is kind of weird, but Finnish isn’t a Germanic language at all? It’s not even Indo-European, so even Hindi is ostensibly closer to English than Finnish. I understand Standard German (along with Icelandic) is itself a bit atypical in that it hasn’t lost its cases when most other Germanic languages did.
Re compounds, I expected they would be more or less easy to deal with by relatively dumb splitting, similar to greedy solutions to the “no spaces” problem of Chinese and Japanese, and your link seems to bear that out. But yeah, cheers to more language-specific stuff in your indexing. /s
https://e.humanities.uva.nl/publications/2004/kamp_lang04.pd...