another contender is zstd: https://github.com/facebook/zstd. It typically offers...

moonchild · on Jan 27, 2021

> It typically offers better compression ratios than LZ4 at a slight (depending on your data) cost in speed

Per the table at [0], zstd provides only a slight improvement in compression ratio, and in exchange is about half the speed of lz4.

They both have their place.

0. https://facebook.github.io/zstd/

hyperpape · on Jan 27, 2021

That table shows zstd comparing poorly at its faster settings, but at slower settings, it offers a significantly better compression ratio, albeit 3x slower decompression.

baybal2 · on Jan 27, 2021

LZ4 has branchless decompression, and lower cache footprint, thus it can work on low end, and non-desktop CPUs equally well.

zstd, brotly, snappy were seemingly all made with high end x86 capabilities in mind.

_vvhw · on Jan 27, 2021

I also appreciate LZ4's simplicity and tiny code footprint.

zstd is brilliant as well, but in terms of code base it's a whole other beast.

makapuf · on Jan 27, 2021

Yes decompression on baremetal cortex m4 is a mere hundreds of bytes, you can decompress it from flash directly to its output buffer.

Teknoman117 · on Jan 27, 2021

I've used it in bootloaders that have slow transfer mechanisms (uart, i2c) to get whatever speedup I can for a few hundred bytes of binary.

wolf550e · on Jan 27, 2021

Google snappy is same class as lzo and lz4, not same class as brotli and zstd.

ignoramous · on Jan 27, 2021

Also see Daniel Reiter Horn's DivANS built at Dropbox: https://dropbox.tech/infrastructure/building-better-compress...

unsigner · on Jan 27, 2021

Zstd is very different - it includes an entropy coder. LZ4 only finds repeated matches, but then doesn't encode them very efficiently.

To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly, while Zstd will compress it 8:1 converging to an encoding where a '1' bit is A, and a '0' bit is B.

YetAnotherNick · on Jan 27, 2021

> To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly

I checked it. LZ4 is still reducing the size to half, no idea why half. So for 10 MB file it compresses to 5 MB.

Edit: checked with highest compression and it compresses 1MB file to 185KB. So what the parent wrote is false.

not2b · on Jan 27, 2021

Yes, if I take the 8 combinations aaa, aab, aba etc and assign each of them a 9 bit codeword I replace each 24 bit sequence with a 9 bit sequence. So arithmetic coders have no problem with cases like this.

unsigner · on Feb 1, 2021

but LZ4 doesn't have a arithmetic coder, or any other statistical encoding - it's just matches and literals. Puzzling...

julian37 · on Jan 27, 2021

Yep, Zstd is the spiritual successor to LZ4 and written by the same person (Yann Collet) after they got hired by Facebook.

smueller1234 · on Jan 27, 2021

Actually, I seem to recall that he was working on it before getting hired by Facebook (unless there was a massive delay in the hiring to become known). I was following his excellent blog posts on the matter at the time.

tmd83 · on Jan 27, 2021

Yes it was a fully working things before facebook. There has been a lot of improvement in both the core and cli. But the core innovations of zstd was well established before facebook. I was probably following his blogs (even though I wasn't a compression expert) for months before I saw the post about his joining facebook.

thechao · on Jan 27, 2021

Yann wrote LZ4 and Zstd well before joining FB. I have to applaud FB for supporting Yann's work, though.

zrav · on Jan 27, 2021

I've spent an afternoon testing zstd's custom dictionaries. It really only provides benefits on small data blocks. According to my tests, the largest blocks at which custom dictionaries could still provide a benefit is 8K, above that the compression ratio advantage compared to the default is definitely gone.

StreamBright · on Jan 27, 2021

> Additionally it offers a training mode to tune the algorithm to increase compression ratio on specific types of data

Yes, however there is usually no facility to train your compression algo with most tools using ZSTD.

pmarreck · on Jan 27, 2021

There should be a way to pool standard dictionaries somewhere, such as a "standard english text corpus data" dictionary, that you can then download on demand for encoding, say, BLOB text fields in a database with little to no overhead.

The way this would probably work without this facility though, say, in a database, is that the dictionary is maintained internally and constructed on the fly from the field data and not exposed to users. Although, I don't know if you'd have to keep every version of the dictionary in order to successfully decompress old data? If so then perhaps this is a niche feature

felixhandte · on Jan 27, 2021

W.r.t. standard dictionaries, it's something we're interested in, but the fundamental reality of dictionaries is that their effectiveness is strongly tied to their specificity. Put another way, a universal dictionary is a self-contradiction.

And yes, totally, I know at least RocksDB supports exactly that behavior [0].

[0] https://github.com/facebook/rocksdb/blob/12f11373554af219c51...