It's not just a matter of encountered first, it's of encountered more often ("en...

gpvos · on Aug 23, 2016

Hmm... I thought that in LZW, once a sequence has been assigned a code, it can never be assigned a shorter code; codes are assigned in order, and they only get longer over time (until a reset is triggered). So the frequency of a particular sequence cannot influence the length of the code assigned to it. I guess I'll read up on it a bit more until I comment further. I do see that it could end up to be optimal in the asymptotic case.

beagle3 · on Aug 23, 2016

You are, of course, correct. However, in the same way that you cannot reassign codes to the letter A-Z and yet you can compress by assigning a longer code to a sequence (the code is longer but is still shorter than the individual codes needed to represent the sequence), then even though you cannot reassign a code, you will get a new code for a longer sequence that includes the shorter sequence, represents more letters, and overall is more likely to occur than the shorter sequence followed by something else.

Assuming no reset, over time the shorter codes fall out of use because the longer ones eventually include everything they represent, extended by all possibilities (with the more probable extensions getting shorter codes by virtue of appearing more often / earlier).