More

szundi · 2025-12-26T21:53:05 1766785985

It would have helped if you tell us why you don’t like this approach.

zsoltkacsandi · 2025-12-26T22:22:19 1766787739

It's right there:

> the source of most of the problems I've seen with infrastructures using Kubernetes came from exactly this kind of approach

But some more concrete stories:

Once, while I was on call, I got paged because a Kubernetes node was running out of disk space. The root cause was the logging pipeline. Normally, debugging a "no space left on device" issue in a logging pipeline is fairly straightforward, if the tools are used as intended. This time, they weren't.

The entire pipeline was managed by a custom-built logging operator, designed to let teams describe logging pipelines declaratively. The problem? The resource definitions alone were around 20,000 lines of YAML. In the middle of the night, I had to reverse-engineer how the operator translated that declarative configuration into an actual pipeline. It took three days and multiple SREs to fully understand and fix the issue. Without such a declarative magic it takes usually 1 hour to solve such an issue.

Another example: external-dns. It's commonly used to manage DNS declaratively in Kubernetes. We had multiple clusters using Route 53 in the same AWS account. Route 53 has a global API request limit per account. When two or more clusters tried to reconcile DNS records at the same time, one would hit the quota. The others would partially fail, drift out of sync, and trigger retries - creating one of the messiest cross-cluster race conditions I've ever dealt with.

And I have plenty more stories like these.

antonvs · 2025-12-27T00:10:49 1766794249

You mention a questionably designed custom operator and an add-on from a SIG. This is like blaming Linux for the UI in Gimp.

jbaiter · 2025-12-27T00:28:01 1766795281

Also not like logging setups outside of k8s can't be a horror show too. Like, have you ever had to troubleshoot a rsyslog based ELK setup? I'll forever have nightmares from debugging RainerScript mixed with the declarative config and having to read the source code to find out why all of our logs were getting dropped in the middle of the night.

NewJazz · 2025-12-27T02:45:36 1766803536

I'd also argue the whole external DNS thing could have happened with any dynamic DNS automation... And yes it is a completely optional add-on!

zsoltkacsandi · 2025-12-27T06:41:31 1766817691

> a questionably designed custom operator

This is the logging operator, the most used logging operator in the cloud native ecosystem (we built it).

> This is like blaming Linux for the UI in Gimp.

I never blamed anything, read my comment again. I only pointed out that problems arise when you use something to do something that is not built for. Like a container orchestrator managing infrastructure (DNS, logging pipelines). That is why I wrote to "it is super important to treat a container orchestrator a container orchestrator". Not a logging pipeline orchestrator, or a control plane for Route 53 DNS.

This has nothing to do with Kubernetes, but with the people who choose to do everything with it (managing the whole infrastructure).

szundi · 2025-12-15T10:25:02 1765794302

If you go mainstream with your requirements, you don’t step on these though

szundi · 2025-12-14T22:29:47 1765751387

Those good ones are not even close though - or are they

szundi · 2025-12-14T18:03:06 1765735386

It is actually quite hard to copywrite if you’re doing a good job.

Also firing people for a minimal bonus is always a lot of people are going to go for

szundi · 2025-12-14T17:58:08 1765735088

History failed on this one badly

szundi · 2025-12-13T16:34:45 1765643685

What really matters though is the quality of the human review.

szundi · 2025-12-13T10:18:10 1765621090

In this case buy the gift card from some shady retailer with a one-time-use virtual card, and give this shady code to your friend. Or buy a physical card from aliexpress, the cheapest one with bad reviews.

szundi · 2025-12-12T15:13:41 1765552421

Diversion for your whole next week: just generate machine code binary for this

szundi · 2025-12-08T20:20:46 1765225246

Like it means AI cannot be even worse

szundi · 2025-12-05T09:42:54 1764927774

If that would be so simple