- Nagios: if it is already set up and covering all requirements in a legacy system.
- Prometheus + Grafana (+ Loki + ...): modern and extensible ecosystem with a unix-philosophy, my default choice of the times.
- Victoria Metrics: up and coming player that combines several features of the promstack into one but is also compatible to be used as part of the mix. Less moving parts and supposedly more performant, but also less accumulated knowledge in forums and the workforce.
- Datadog: if in a team of mainly developers who want good insights out of the box, ootb being the main feature, and if money is of no concern. Complete vendor lock-in and their sales reps will call your private number at 3 am on a sunday if they have it.
- Monit: if im a one man army (RoR on heroku type) who needs some basic monitoring and more importantly supervisor capabilities (e.g. restart service on failure).
Q2. That depends on your requirements (Scale? Logs? UI? Tracing? Environment (e.g. container)?), the less you need the simpler you can go. Some bash, systemd unit files and htop could be all you need.
Q3: I'd go with Prometheus and Grafana, easy enough to get going, extend as needed (features and scale) and hire for.
- Nagios: if it is already set up and covering all requirements in a legacy system.
- Prometheus + Grafana (+ Loki + ...): modern and extensible ecosystem with a unix-philosophy, my default choice of the times.
- Victoria Metrics: up and coming player that combines several features of the promstack into one but is also compatible to be used as part of the mix. Less moving parts and supposedly more performant, but also less accumulated knowledge in forums and the workforce.
- Datadog: if in a team of mainly developers who want good insights out of the box, ootb being the main feature, and if money is of no concern. Complete vendor lock-in and their sales reps will call your private number at 3 am on a sunday if they have it.
- Monit: if im a one man army (RoR on heroku type) who needs some basic monitoring and more importantly supervisor capabilities (e.g. restart service on failure).
Q2. That depends on your requirements (Scale? Logs? UI? Tracing? Environment (e.g. container)?), the less you need the simpler you can go. Some bash, systemd unit files and htop could be all you need.
Q3: I'd go with Prometheus and Grafana, easy enough to get going, extend as needed (features and scale) and hire for.