I know it's not easy, but what's preventing PodtgreSQL from having compressed pa...

uhoh-itsmaciek · on Feb 10, 2022

This post glosses over it, but larger fields are stored via the TOAST mechanism [1], which does support compression.

[1]: https://www.postgresql.org/docs/current/storage-toast.html

fdr · on Feb 10, 2022

As you said, it's not easy. This is the long and the short of it. Especially if you want to compress multiple pages together and somehow reasonably handle compression of toasted values across multiple rows and pages.

That said, there has been some progress in the general direction of allowing this: supporting new storage implementations, e.g. zheap (https://wiki.postgresql.org/wiki/Zheap). But it's a large effort. Consider it has to implement new crash recovery, for starters.

rwbhn · on Feb 10, 2022

Under some conditions they do [0]. How useful it ends up being depends on your data - pretty effective in my experience.

[0] https://www.postgresql.org/docs/current/storage-toast.html

btilly · on Feb 10, 2022

You can, if you wish, run PostgresQL on a compressed filesystem.

But you'd better be sure that its storage guarantees match what Postgres needs, else you'll risk database corruption.

eska · on Feb 10, 2022

There are extensions that do this, e.g. Timescaledb with its column-oriented data (delta delta rle for integers, gorilla for reals, dictionary for others).

stickyricky · on Feb 10, 2022

postgres performs compression on datatypes like array. compression adds overhead. i'm sure they have a reason to not compress everything. and i'm sure that reason has to do with prioritizing cpu over iops.

jstrong · on Feb 10, 2022

compressed storage would bring huge performance and size-on-disk wins. I suspect the issue is the difficulty and resources needed to add compression on top of the existing design rather than a decision that the tradeoffs are not worth it.

pgaddict · on Feb 10, 2022

Well, the simple truth is adding efficient compression to the row storage (which is what heap does) is not really possible. Or more precisely - you can do that outside the database by using a filesystem with a compression (like zfs), and doing that within the database won't give you much advantage. Which is probably why no one really proposed implementing that, because the cost/gain ratio is just not worth it.

The problem with heap and compression is that it mixes a lot of different data (i.e. data from different columns), which is not great for compression ratio. To address that, it's necessary to use some variant of a columnar format, and allowing such stuff is one of the goals of the table AM API, zedstore AM etc.