> You're missing my main point: logs should not be your primary source of inform...

KaiserPro · on July 11, 2024

> Truthfully, you're probably just doing it wrong if you can't derive actionable metrics from logs

I have ~200 services, each composed of many sub services, each made up of a number of processes. something like 150k processes.

Now, we are going to ship all those logs, where every transaction emits something like 500-2000 bytes of data. Storing that is easy, evne storing it in a structured way is easy. making sure we don'y leak PII is a lot harder, so we have to have fairly strict ACLs.

now, I want process them to generate metrics and then display them. But that takes a lot of horse power. Moreover when I want to have metrics for more than a week or so, the amount of data I have to process grows linearly. I also need to back up that data, and derived metrics. We are looking at a large cluster just for processing.

Now, if we make sure that our services emit metrics for all useful things, the infra for recording, processing and displaying that is much smaller, maybe two/three instances. Not only that but custom queries are way quicker, and much more resistant to PII leaking. Just like structured logging, it does require some dev effort.

At no point is it _impossible_ to use logs as the data store/transport, its just either fucking expensive, fragile, or dogshit slow.

or to put it another way:

old system == >£1million in licenses and servers (yearly)

metric system == £100k in licenses and servers + £12k for the metrics servers (yearly)