-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Describe the feature request
Summary
Add a --label flag to the alert subcommand that filters alerts by label key-value pairs. This enables per-node alert distribution in monitoring systems like Icinga, where each host should only see alerts relevant to itself.
or more specifically
Add a --node flag to filter alerts to only those matching a specific node name. This enables per-host alert distribution in Icinga where each monitored host should only see its own alerts.
Problem Statement
When using check_prometheus with Prometheus/vmalert, alerts often include labels identifying the affected host (e.g., node, hostname, instance). However, there's currently no way
to filter alerts by these labels.
Current behavior:
$ check_prometheus alert --name "ConsulServiceCritical"
CRITICAL - 50 Alerts: 50 Firing - 0 Pending - 0 Inactive
\_[CRITICAL] [ConsulServiceCritical] node=server1 ...
\_[CRITICAL] [ConsulServiceCritical] node=server2 ...
\_[CRITICAL] [ConsulServiceCritical] node=server3 ...
... (all 50 nodes) Problem:
- Every Icinga host running this check sees ALL alerts, not just its own
- Alert counts are inflated on each host
- No way to map alerts to their corresponding monitored hosts
- Operators can't quickly identify which specific node has issues
Proposed Solution
Add a --label flag that accepts key=value pairs to filter alerts:
$ check_prometheus alert --name "ConsulServiceCritical" --label "node=server1"
CRITICAL - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_[CRITICAL] [ConsulServiceCritical] node=server1 is firing - value: 1.00A more specific implementation (for Icinga in our case)
We run check_prometheus against vmalert to surface Consul service alerts in Icinga. Each alert includes a node label identifying the affected host:
{
"alertname": "ConsulServiceNodeCritical",
"node": "app01.example.com",
"service_name": "api-gateway",
"severity": "critical"
} Current behavior: Every Icinga host sees ALL alerts for ALL nodes:
$ check_prometheus alert --name "ConsulServiceCritical"
CRITICAL - 47 Alerts: 47 Firing - 0 Pending - 0 Inactive
\_[CRITICAL] [ConsulServiceCritical] node=app01.example.com ...
\_[CRITICAL] [ConsulServiceCritical] node=app02.example.com ...
\_[CRITICAL] [ConsulServiceCritical] node=app03.example.com ...
... (44 more nodes) Expected behavior: Each host should only see its own alerts:
$ check_prometheus alert --name "ConsulServiceCritical" --node "app01.example.com"
CRITICAL - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_[CRITICAL] [ConsulServiceCritical] node=app01.example.com is firing Use Case
In our Icinga configuration, we want to assign node-specific alerts to their corresponding hosts:
apply Service "ConsulServiceCritical" {
import "generic-service"
check_command = "check_prometheus_alert"
vars.alertname = "ConsulServiceCritical"
vars.node = host.name # <-- filter by this host
vars.no_alerts_state = "OK"
assign where match(host.name, "*")
}
Without node filtering:
- app01 shows 47 critical alerts (all nodes)
- app02 shows 47 critical alerts (all nodes)
- Operators can't tell which host actually has the problem
With node filtering:
- app01 shows 1 alert (only its own)
- app02 shows 0 alerts (OK state)
- Clear visibility into which specific host is affected
Let me know your thoughts on this.
Thanks and Regards