Alerting
Architecture
flowchart LR
VM["VictoriaMetrics<br/>(metrics)"]
VMA["vmalert<br/>(rule evaluation)"]
AM["Alertmanager<br/>(routing)"]
Ntfy["ntfy.sh<br/>(push notifications)"]
Phone["Phone<br/>(ntfy app)"]
VM -->|"PromQL queries"| VMA
VMA -->|"firing alerts"| AM
AM -->|"webhook"| Ntfy
Ntfy -->|"push"| Phone
style VMA fill:#7b241c,stroke:#c0392b,color:#fff
style Ntfy fill:#1e8449,stroke:#27ae60,color:#fff
vmalert
vmalert evaluates alerting rules against VictoriaMetrics and sends firing alerts to Alertmanager.
| Parameter |
Value |
| Evaluation interval |
30s |
| Data source |
VictoriaMetrics (http://victoriametrics:8428) |
| Alert destination |
Alertmanager (http://alertmanager:9093) |
| Rule files |
/etc/vmalert/rules/*.yml |
Alert Rules
Infrastructure Alerts
| Alert |
Condition |
For |
Priority |
| NodeDown |
up{job="node-exporter"} == 0 |
2m |
Critical |
| NodeHighCPU |
(1 - avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 0.90 |
10m |
Warning |
| NodeHighMemory |
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.90 |
5m |
Warning |
| NodeDiskFull |
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.10 |
5m |
Critical |
| NodeDiskPrediction |
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 24*3600) < 0 |
30m |
Warning |
| SystemdUnitFailed |
node_systemd_unit_state{state="failed"} == 1 |
1m |
Warning |
Kubernetes Alerts
| Alert |
Condition |
For |
Priority |
| PodCrashLooping |
rate(kube_pod_container_status_restarts_total[15m]) > 0 |
5m |
Warning |
| PodNotReady |
kube_pod_status_phase{phase=~"Pending\|Unknown"} == 1 |
10m |
Warning |
| PodOOMKilled |
kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1 |
0m |
Critical |
| DeploymentReplicasMismatch |
kube_deployment_spec_replicas != kube_deployment_status_ready_replicas |
10m |
Warning |
| K3sAPIDown |
up{job="k3s"} == 0 |
1m |
Critical |
| K3sAPILatencyHigh |
histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket[5m])) > 1 |
5m |
Warning |
Security Alerts
| Alert |
Condition |
For |
Priority |
| CrowdSecBanSpike |
increase(cs_active_decisions[1h]) > 20 |
0m |
Warning |
| SSHAuthFailure |
increase(node_logind_sessions_total{type="failed"}[5m]) > 5 |
0m |
Warning |
| CertExpiringSoon |
(x509_cert_not_after - time()) / 86400 < 14 |
1h |
Warning |
| CertExpired |
(x509_cert_not_after - time()) < 0 |
0m |
Critical |
Beast-Specific Alerts
| Alert |
Condition |
For |
Priority |
| BeastLongSession |
Custom: session > 8h |
0m |
Warning |
| BeastOrphan |
absent(up{instance=~".*10.0.1.3.*"}) and no drain event |
30m |
Warning |
| BeastBudgetAlert |
Custom: monthly hours > 50 |
0m |
Info |
| Alert |
Condition |
For |
Priority |
| VictoriaMetricsDown |
up{job="victoriametrics"} == 0 |
1m |
Critical |
| LokiDown |
up{job="loki"} == 0 |
2m |
Critical |
| GrafanaDown |
up{job="grafana"} == 0 |
2m |
Warning |
| AlertmanagerDown |
up{job="alertmanager"} == 0 |
1m |
Critical |
| ScrapeTargetDown |
up == 0 and not Beast-related |
5m |
Warning |
ntfy.sh Webhook Relay
Alertmanager sends notifications to ntfy.sh via webhook. ntfy.sh delivers push notifications to the phone app.
Alertmanager Configuration
route:
receiver: ntfy
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: ntfy-critical
repeat_interval: 1h
- match:
severity: info
receiver: ntfy-info
repeat_interval: 24h
receivers:
- name: ntfy
webhook_configs:
- url: https://ntfy.sh/lron-alerts
send_resolved: true
- name: ntfy-critical
webhook_configs:
- url: https://ntfy.sh/lron-alerts-critical
send_resolved: true
- name: ntfy-info
webhook_configs:
- url: https://ntfy.sh/lron-alerts-info
send_resolved: false
Priority Mapping
| Alert Priority |
ntfy Priority |
ntfy Topic |
Phone Behavior |
| Critical |
urgent (5) |
lron-alerts-critical |
Sound + vibrate + persistent notification |
| Warning |
high (4) |
lron-alerts |
Sound + notification |
| Info |
default (3) |
lron-alerts-info |
Silent notification |
ntfy.sh is free
ntfy.sh is a free, open-source push notification service. No account required -- topics are public but unguessable (use random suffixes in production). For true privacy, self-host ntfy on the Hub node.
Alert fatigue
The repeat_interval settings are tuned to avoid alert fatigue. Critical alerts repeat every hour, warnings every 4 hours, info every 24 hours. Adjust as needed based on operational experience.