Uptime Monitoring¶
Two-Layer Approach¶
Uptime monitoring uses two layers: internal (Uptime Kuma, self-hosted) and external (UptimeRobot, third-party). This ensures visibility even when the cluster itself is down.
flowchart TB
subgraph Internal["Internal (Uptime Kuma on Hub)"]
UK["Uptime Kuma"]
UK -->|"probe"| T1["Caddy (DMZ)"]
UK -->|"probe"| T2["Rancher (Hub)"]
UK -->|"probe"| T3["Grafana (Hub)"]
UK -->|"probe"| T4["VictoriaMetrics"]
UK -->|"probe"| T5["Loki"]
UK -->|"probe"| T6["K3s API"]
UK -->|"probe"| T7["WireGuard"]
end
subgraph External["External (UptimeRobot)"]
UR["UptimeRobot"]
UR -->|"HTTP check"| E1["ttyd.vdhome.be"]
UR -->|"HTTP check"| E2["auth.vdhome.be"]
UR -->|"ping"| E3["Hub IP"]
UR -->|"ping"| E4["DMZ IP"]
end
UK -->|"ntfy webhook"| Ntfy["ntfy.sh"]
UR -->|"email"| Email["dev@vdhome.be"]
style UK fill:#1a5276,stroke:#2980b9,color:#fff
style UR fill:#1e8449,stroke:#27ae60,color:#fff
Uptime Kuma (Self-Hosted)¶
Uptime Kuma runs on the Hub node as a Kubernetes deployment, monitoring all internal services.
| Parameter | Value |
|---|---|
| Deployment | Hub node (lron/role=hub) |
| Namespace | monitoring |
| Port | 3001 |
| Check interval | 60s |
| Retry count | 3 |
| Notification | ntfy.sh webhook |
Monitors¶
| Monitor | Type | Target | Interval | Timeout |
|---|---|---|---|---|
| Caddy HTTPS | HTTP(S) | https://ttyd.vdhome.be (expect 401/302) |
60s | 10s |
| Authelia | HTTP(S) | https://auth.vdhome.be/api/health |
60s | 10s |
| Rancher UI | HTTP(S) | https://rancher.vdhome.be/ping |
60s | 10s |
| Grafana | HTTP | http://grafana.monitoring:3000/api/health |
60s | 5s |
| VictoriaMetrics | HTTP | http://victoriametrics.monitoring:8428/-/healthy |
60s | 5s |
| Loki | HTTP | http://loki.monitoring:3100/ready |
60s | 5s |
| K3s API | TCP | 10.0.1.1:6443 |
60s | 5s |
| WireGuard | Ping | 10.0.2.2 (home peer) |
120s | 10s |
| DMZ SSH | TCP | 10.0.1.2:2222 |
60s | 5s |
| Beast SSH | TCP | 10.0.1.3:2222 |
60s | 5s |
| Hub disk | Push | Agent script checks disk > 90% | 300s | -- |
Beast monitoring
The Beast SSH monitor will be in "down" state whenever Beast is not running. This is expected and does not trigger alerts -- the monitor is paused via the Beast lifecycle scripts.
Status Page¶
Uptime Kuma provides a status page at http://uptime.monitoring:3001/status/lron (accessible via WireGuard). It shows:
- Current status of all monitors
- 30-day uptime percentage
- Average response time
- Incident history
UptimeRobot (External)¶
UptimeRobot provides monitoring from outside the Hetzner infrastructure. This catches scenarios where the entire Hetzner project is unreachable.
| Parameter | Value |
|---|---|
| Plan | Free (50 monitors, 5-min interval) |
| Check interval | 5 minutes |
| Alert contacts | dev@vdhome.be (email) |
External Monitors¶
| Monitor | Type | Target | Expected |
|---|---|---|---|
| ttyd.vdhome.be | HTTPS | https://ttyd.vdhome.be |
302 (redirect to auth) |
| auth.vdhome.be | HTTPS | https://auth.vdhome.be |
200 |
| Hub ping | Ping | Hub public IP | ICMP reply |
| DMZ ping | Ping | DMZ public IP | ICMP reply |
Why Both Internal and External?¶
| Scenario | Uptime Kuma | UptimeRobot |
|---|---|---|
| Single service crash | Detects (1 min) | Detects (5 min) |
| Hub VM down | Down itself | Detects (5 min) |
| Hetzner network issue | May not detect | Detects (5 min) |
| Internet routing issue (path-specific) | Detects (1 min) | May not detect |
| DNS failure | Detects (uses IPs internally) | Detects (uses domains) |
Single point of failure
Uptime Kuma runs on the Hub. If the Hub dies, internal monitoring dies with it. UptimeRobot is the safety net for this scenario.