Skip to content

Uptime Monitoring

Two-Layer Approach

Uptime monitoring uses two layers: internal (Uptime Kuma, self-hosted) and external (UptimeRobot, third-party). This ensures visibility even when the cluster itself is down.

flowchart TB
    subgraph Internal["Internal (Uptime Kuma on Hub)"]
        UK["Uptime Kuma"]
        UK -->|"probe"| T1["Caddy (DMZ)"]
        UK -->|"probe"| T2["Rancher (Hub)"]
        UK -->|"probe"| T3["Grafana (Hub)"]
        UK -->|"probe"| T4["VictoriaMetrics"]
        UK -->|"probe"| T5["Loki"]
        UK -->|"probe"| T6["K3s API"]
        UK -->|"probe"| T7["WireGuard"]
    end

    subgraph External["External (UptimeRobot)"]
        UR["UptimeRobot"]
        UR -->|"HTTP check"| E1["ttyd.vdhome.be"]
        UR -->|"HTTP check"| E2["auth.vdhome.be"]
        UR -->|"ping"| E3["Hub IP"]
        UR -->|"ping"| E4["DMZ IP"]
    end

    UK -->|"ntfy webhook"| Ntfy["ntfy.sh"]
    UR -->|"email"| Email["dev@vdhome.be"]

    style UK fill:#1a5276,stroke:#2980b9,color:#fff
    style UR fill:#1e8449,stroke:#27ae60,color:#fff

Uptime Kuma (Self-Hosted)

Uptime Kuma runs on the Hub node as a Kubernetes deployment, monitoring all internal services.

Parameter Value
Deployment Hub node (lron/role=hub)
Namespace monitoring
Port 3001
Check interval 60s
Retry count 3
Notification ntfy.sh webhook

Monitors

Monitor Type Target Interval Timeout
Caddy HTTPS HTTP(S) https://ttyd.vdhome.be (expect 401/302) 60s 10s
Authelia HTTP(S) https://auth.vdhome.be/api/health 60s 10s
Rancher UI HTTP(S) https://rancher.vdhome.be/ping 60s 10s
Grafana HTTP http://grafana.monitoring:3000/api/health 60s 5s
VictoriaMetrics HTTP http://victoriametrics.monitoring:8428/-/healthy 60s 5s
Loki HTTP http://loki.monitoring:3100/ready 60s 5s
K3s API TCP 10.0.1.1:6443 60s 5s
WireGuard Ping 10.0.2.2 (home peer) 120s 10s
DMZ SSH TCP 10.0.1.2:2222 60s 5s
Beast SSH TCP 10.0.1.3:2222 60s 5s
Hub disk Push Agent script checks disk > 90% 300s --

Beast monitoring

The Beast SSH monitor will be in "down" state whenever Beast is not running. This is expected and does not trigger alerts -- the monitor is paused via the Beast lifecycle scripts.

Status Page

Uptime Kuma provides a status page at http://uptime.monitoring:3001/status/lron (accessible via WireGuard). It shows:

  • Current status of all monitors
  • 30-day uptime percentage
  • Average response time
  • Incident history

UptimeRobot (External)

UptimeRobot provides monitoring from outside the Hetzner infrastructure. This catches scenarios where the entire Hetzner project is unreachable.

Parameter Value
Plan Free (50 monitors, 5-min interval)
Check interval 5 minutes
Alert contacts dev@vdhome.be (email)

External Monitors

Monitor Type Target Expected
ttyd.vdhome.be HTTPS https://ttyd.vdhome.be 302 (redirect to auth)
auth.vdhome.be HTTPS https://auth.vdhome.be 200
Hub ping Ping Hub public IP ICMP reply
DMZ ping Ping DMZ public IP ICMP reply

Why Both Internal and External?

Scenario Uptime Kuma UptimeRobot
Single service crash Detects (1 min) Detects (5 min)
Hub VM down Down itself Detects (5 min)
Hetzner network issue May not detect Detects (5 min)
Internet routing issue (path-specific) Detects (1 min) May not detect
DNS failure Detects (uses IPs internally) Detects (uses domains)

Single point of failure

Uptime Kuma runs on the Hub. If the Hub dies, internal monitoring dies with it. UptimeRobot is the safety net for this scenario.