Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Language detection in the Bash command line using gzip (2011) (ileriseviye.wordpress.com)
11 points by optimalsolver on Nov 11, 2020 | hide | past | favorite | 3 comments


What an interesting thing!

This relies on taking corpuses (corpii?) of a fixed length, concatenating the unknown string and compressing the lot. The one that compresses better is likely to be the one that is the same language.

I have an intuitive understanding of _why_ this is likely to be (words/phrases in the new string are likely to appear in the corpus of the same language and so will compress more easily) but I feel like I'm glimpsing at something just out of reach.

Unfortunately I can't find the original lectures, but I'll keep looking



ooh thankyou!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: