I was at a large financial news site, They were a total splunk shop. We had lots...

david38 · on July 11, 2024

This.

I was an engineer at Splunk for many years. I knew it cold.

I then joined a startup where they just used metrics and the logs TTLed out after just a week. They were just used for short term debugging.

The metrics were easier to put in, keep organized, make dashboards from, lighter, cheaper, better. I had been doing it wrong this whole time.

brabel · on July 11, 2024

> the logs TTLed out after just a week

"expired" is the word you're looking for.

p-o · on July 11, 2024

It's fair that you had a different experience than I had. However, your experience seems to be very close to what I was describing. Cost got prohibitive (splunk), and you chose a different avenue. It's totally acceptable to do that, but your experience doesn't reflect mine, and I don't think I'm the exception.

I've used both grafana+metrics and logs to different degrees. I've enjoyed using both, but any system I work on starts with logs and gradually add metrics as needed, it feels like a natural evolution to me, and I've worked at different scale, like you.

hanniabu · on July 11, 2024

I feel like I shouldn't need to mention this, but comparing a news site to a financial exchange with money at stake is not the same. If there is a glitch you need to be able to trace it back and you can't do that with some abstracted metrics.

pixl97 · on July 11, 2024

Yea, on a news site, the metrics are important. If suddenly you start seeing errors accrue above background noise and it's affecting a number of people you can act on it. If it's affecting one user, you probably don't give a shit.

In finance if someone puts and entry for 1,000,000,000 and it changes to 1,000,000 the SEC, fraud investigators, lawyers, banks, and some number of other FLAs are shining a flashlight up your butt as to what happened.

KaiserPro · on July 11, 2024

right, and the SEC see that you're mixing verbose k8s logging with financial records, you're going to get a bollocking.

KaiserPro · on July 11, 2024

You are misreading me.

I'm not saying that you can't log, I'm saying that logging _everything_ on debug in an unstructured way and then hoping to devine a signal from it, is madness. You will need logs, as they eventually tell you what went wrong. But they are very bad at telling you that something is going wrong now.

Its also exceptionally bad at allowing you quickly pinpointing _when_ something changed.

Even in a logging only environment, you get an alert, you look at the graphs, then dive into the logs. The big issue is that those metrics are out of date, hard to derrive and prone to breaking when you make changes.

verbose logging is not a protection in a financial market, because if something goes wrong you'll need to process those logs for consumption by a third party. You'll then have to explain why the format changed three times in the two weeks leading up to that event.

Moreover you will need to seperate the money audit trail from the verbose application logs, ideally at source. as its "high value data" you can't be mixing those stream at all