SendGrid, so billions of incoming requests that multiplex out to at a minimum 8 other log events, plus all errors, etc. We have to run our own splunk instances. We used to store data for a much longer time but as our scale keeps going up so do the costs. We've had to reduce this to a 7 day lookback for higher volume services. For lower volume (in the millions not billions), 30 days to a year lookback depending.
As for blog posts - I'm not aware of any. I've actually wanted to show off what we have but have never prioritized the blog post.
For compressed logs, it doesn’t have to be significant. If you can get structured logs into a TSDB you really only need to retain logs in cold storage or for as long as more detailed views or correlation might be necessary.
Do you have any resources/blog posts/keywords for learning more?