This relies on taking corpuses (corpii?) of a fixed length, concatenating the unknown string and compressing the lot. The one that compresses better is likely to be the one that is the same language.
I have an intuitive understanding of _why_ this is likely to be (words/phrases in the new string are likely to appear in the corpus of the same language and so will compress more easily) but I feel like I'm glimpsing at something just out of reach.
Unfortunately I can't find the original lectures, but I'll keep looking
This relies on taking corpuses (corpii?) of a fixed length, concatenating the unknown string and compressing the lot. The one that compresses better is likely to be the one that is the same language.
I have an intuitive understanding of _why_ this is likely to be (words/phrases in the new string are likely to appear in the corpus of the same language and so will compress more easily) but I feel like I'm glimpsing at something just out of reach.
Unfortunately I can't find the original lectures, but I'll keep looking