We also started with the typical kube-prometheus-stack, but we don’t like Prometheus/PromQL. Moreover, it only solves the „metrics“ part - to handle logs and traces, more quite heavy and complex components have to be added to the observability stack.
This didn‘t feel right, so we looked around and found greptimedb https://github.com/GreptimeTeam/greptimedb, which simplifies the whole stack. It‘s designed to handle metrics, logs, and traces. We collect metrics and logs via OpenTelemetry, and visualize them with Grafana. It provides endpoints for Postgres, MySQL, PromQL; we‘re happy to be able to build dashboards using SQL as that’s where we have the most knowledge.
The benchmarks look promising, but our k8s clusters aren’t huge anyway. As a platform engineer, we appreciate the simplicity of our observability stack.
Any other happy greptimedb users around here? Together with OTel, we think we can handle all future obs needs.
Thank you for giving GreptimeDB a shout-out—it means a lot to us. We created GreptimeDB to simplify the observability data stack with an all-in-one database, and we’re glad to hear it’s been helpful.
OpenTelemetry-native is a requirement, not an option, for the new observability data stack. I believe otel-arrow (https://github.com/open-telemetry/otel-arrow) has strong future potential, and we are committed to supporting and improving it.
FYI: I think SQL is great for building everything—dashboards, alerting rules, and complex analytics—but PromQL still has unique value in the Prometheus ecosystem. To be transparent, GreptimeDB still has some performance issues with PromQL, which we’ll address before the 1.0 GA.
Are you saying that you prefer SQL over PromQL for metrics queries? I haven't tried querying metrics via SQL yet, but generally speaking have found PromQL to be one of the easier query languages to learn - more straightforward and concise IME. What advantages does SQL offer here?
I didn’t mean SQL over PromQL — they’re designed for different layers of problems.
SQL has a broader theoretical scope: it’s a general-purpose language that can describe almost any kind of data processing or analytics workflow, given the right schema and functions.
PromQL, on the other hand, is purpose-built for observability — it’s optimized for time‑series data, streaming calculations, and real‑time aggregation. It’s definitely easier to learn and more straightforward when your goal is to reason about metrics and alerting.
SQL’s strengths are in relational joins, richer operator sets, and higher‑level abstraction, which make it more powerful for analytical use cases beyond monitoring. PromQL trades that flexibility for simplicity and immediacy — which is exactly what makes it great for monitoring.
I recommend using your preferred flavor of configuration management tool. It is tricky, especially when you want to provision multiple users in different Grafana organizations, data sources, and their dashboards, but it can be done (I prefer Puppet because of its flexible language, but Ansible should also work).
You could take a look at Postgres + TimescaleDB extension, which offers a nice time_bucket() function on its hypertables[1]. You can also materialize using continuous aggregates („self updating“ materialized views).
Thanks, this looks exactly what I want. Sensible interval origins [1] too (January 1, 2000 for months and years, and January 3 2000, a Monday, for weeks) and also configurable.
I also started with that stack, but swapped out InfluxDB for Postgres + TimescaleDB extension, which adds timeseries workflows (transparent partitioning, compression, data retention, continuous aggregates, …).
I found InfluxDB to be lacking in terms of permissions management, flexibility regarding queries (SQL, joins), data retention, ability to debug problems. In Postgres, for example, I can look into the execution plan of a statement, log long running queries, and so on.
Telegraf as an agent is very flexible; it has input plugins for every task I could want, and besides it’s default „pull workflow“ (checks on defined interval) I also like to push new metrics directly to the Telegraf inputs.socket plugin from my scripts (backup stats, …).
How do you get data from Telegraf into Postgres/TimescaleDB?
I was interested in swapping out InfluxDB, but it turned out to be somewhat difficult to send data from Telegraf to Postgres. It's not as simple as making an HTTP post, like you can do with InfluxDB.
Even there, I don't know if it's a brain-finger disconnect or something, but about 10% of the time when I hit and slide, apparently I miss, because I slide my finger around and nothing happens.
If it were <1% of the time, no problem.
Also, the fact I have to glance down to do it (on my external keyboard I just jab my pinky up really quick, don't have to look at all) makes me angry every time.
This didn‘t feel right, so we looked around and found greptimedb https://github.com/GreptimeTeam/greptimedb, which simplifies the whole stack. It‘s designed to handle metrics, logs, and traces. We collect metrics and logs via OpenTelemetry, and visualize them with Grafana. It provides endpoints for Postgres, MySQL, PromQL; we‘re happy to be able to build dashboards using SQL as that’s where we have the most knowledge.
The benchmarks look promising, but our k8s clusters aren’t huge anyway. As a platform engineer, we appreciate the simplicity of our observability stack.
Any other happy greptimedb users around here? Together with OTel, we think we can handle all future obs needs.