-
Notifications
You must be signed in to change notification settings - Fork 9.9k
Description
Proposal
Hello,
I have setup Prometheus-Thanos cluster for monitoring servers(~500 instances).
Cluster contains 2 instances of each component.
I also have 2 Alertmanager instances that are in cluster.
Problem is that i have a lot of alerts, and i always have some alerts firing, so i am fixing them one by one.
When there is some connectivity problem and prometheus do not receive data for alert that is firing it is sending RESOLVED notification which is not true.
I understand that this is default behaviour for Prometheus, but it is false positive and could result in wrong decisions.
It would be nice to have some configuration option when writing an alert to setup something like No Data -> Keep Last State (feature on Grafana).
This feature will allow us to tell Prometheus to keep alert firing if there is no data received, because we can set another alert for target status, no data, etc.. There is no reason to receive resolved when alert is not resolved.
My logic is:
mem_used_percent > 80
value 70 -> alert firing
no value -> resolved
As you can see we receive resolve for no data, not for real value.
Alert should be resolved only if value is above threshold. No data is no data.
This can be really hard to manage, you need to find workarounds with Alertmanager inhibitions to silence some resolved alerts if another for no data is active or something like that. Imagine someone who is just starting with this services...
Best Regards!