AMP is an internal proprietary fork of Cortex, they're not up-streaming their changes, also in large part due to the scalability limits of Cortex's design. It has the same scalability limitation I described earlier with the lack of a Kafka-like component to soak up ingest floods.
> Multi-petabyte
Sheer storage size is a meaningless point here, as longer retention requires more storage. There may or may not be compaction components that help speed up queries over larger windows, but that's irrelevant to the point that the queries will still succeed. I have no doubt that any of the solutions on the table will handle storing that much data.
The real scaling question is how many active timeseries the system can handle, at which resolution (per 15 seconds? per 60 seconds? worse?), and no, "we scale horizontally" doesn't mean much without more serious benchmarks.
>The real scaling question is how many active timeseries the system can handle
handle? What does it mean? Be able to ingest data? Be able to query?
ingest data - Using kafka helps only during ingestion for handling the spike.
query data - Kafka has no role to play in it. Querying performantly at scale is a hard problem. I do not doubt Mimir's capability in being able to query high volumes of data, but other systems can do it too and OpenObserve's internal benchmarks show that it's querying is much faster at scale than Mimir and we will publish it at the right time (We don't just publish benchmarks to satisfy plain curiosity of people on internet), but this is not about OpenObserve so let's push it aside for a while.
About - how many active timeseries
We've built OpenObserve with a fundamentally different architecture. We don't have the "active timeseries" constraint that Prometheus-based systems do. High cardinality isn't an issue by design It's a topic for another day though.
The primary function of a message broker is to decouple producer and consumer so writes can happen efficiently (consumers do not get bogged down by high incoming volume). Something like Kafka allows that very, very well, and it is one of the best systems designed to do it. It allows massive volumes of ingestion reliably without dropping packets. It's a beast on it's own though.
Kafka was also built in an era when autoscaling was not available (Still very relevant though and will be for a very long time). Autoscaling to a great degree can allow you to handle write spikes (It's not the same thing but can attack the same problem from a different angle) and extreme spikes will still require a message broker. Horizontally scalable does cut it to a great degree though.
Having architected massive systems for multiple large companies, I can argue about technology for a long time, but the only point I want to drive is to avoid the use of words like "period". Mimir's architecture makes sense but it's not the only solution that works at scale, and the operational complexity has real costs. There are no absolutes in tech as in life.
I love it when people take a hard stand like this, using the words "period"
BTW, Cortex is used as Amazon Managed Prometheus (Probably at a much larger scale) than Mimir by AWS.
OpenObserve too, is already being used at a multi-petabyte scale.