Skip to content

Risk Register

Resolved Risks

Risks that were identified during design and have been mitigated.

ID Risk Mitigation Status
R-01 SSH brute-force on public IPs endlessh tarpit on port 22, real SSH on 2222, CrowdSec IDS, key-only auth Resolved
R-02 Unauthorized web access Authelia TOTP on all public endpoints, Caddy forward_auth Resolved
R-03 Secrets committed in plaintext SOPS+age encryption for all secrets in repo Resolved
R-04 Beast VM forgotten running (cost overrun) 8h warning, 12h auto-destroy, monthly hour tracking, budget hard stop Resolved
R-05 Configuration drift on long-lived VMs Ansible idempotent playbooks, periodic re-runs, K3s + Fleet GitOps Resolved
R-06 No visibility into cluster health Full observability stack: VictoriaMetrics, Loki, Grafana, Uptime Kuma Resolved
R-07 Alert fatigue from noisy alerts Tuned repeat intervals, priority-based routing, Beast-aware suppression Resolved
R-08 DMZ compromise exposes management plane Dedicated VM + Cilium NetworkPolicy isolation, no direct DMZ-to-monitoring path Resolved
R-09 TLS certificate expiration Caddy auto-renewal via Let's Encrypt DNS-01, cert expiry alert rule Resolved
R-10 Single point of monitoring failure UptimeRobot external monitoring as backup for Uptime Kuma Resolved
R-11 Lateral movement after node compromise Cilium NetworkPolicy default-deny per namespace, UFW egress restrictions Resolved
R-12 Kubernetes secrets readable at rest K3s --secrets-encryption flag (AES-CBC), encryption config backed up Resolved

Remaining Risks

Risks that are accepted or partially mitigated.

ID Risk Severity Likelihood Impact Current Mitigation Residual
R-20 Hub VM total loss (disk failure, Hetzner issue) High Low Full cluster down, 12h RPO etcd snapshots every 12h, full rebuild procedure documented 2-3h RTO, possible 12h data loss
R-21 age key loss (locked out of all SOPS secrets) Critical Very Low Cannot decrypt any secrets, full rebuild impossible without key Bitwarden backup of age key If Bitwarden also lost, unrecoverable
R-22 Hetzner account compromise Critical Very Low Attacker controls all infrastructure Strong password + 2FA on Hetzner, API token scoped, CrowdSec on VMs Attacker could destroy all VMs; rebuild from code + Bitwarden
R-23 Supply chain attack via container images Medium Low Compromised workloads Use official images, pin versions, no third-party registries No image scanning (Trivy not deployed)
R-24 Hetzner region outage (FSN1) Medium Very Low All VMs unavailable No multi-region redundancy Accept downtime, rebuild in NBG1/HEL1 if prolonged
R-25 WireGuard key compromise High Very Low Attacker gains tunnel access to private network Key rotation procedure documented, WireGuard listens on non-standard port Manual detection only (no WG auth logging)
R-26 ntfy.sh service outage Low Low Alert notifications not delivered Alerts still visible in Grafana/vmalert UI No backup notification channel
R-27 K3s zero-day vulnerability High Low Cluster compromise Timely upgrades, CrowdSec, network segmentation Manual patching, no auto-update
R-28 OpenTofu state corruption Medium Very Low Cannot manage infrastructure via IaC State committed to Git (version history), tofu import as fallback Manual state reconstruction possible but tedious
R-29 Caddy/Authelia bypass vulnerability High Low Unauthenticated access to internal services Keep up to date, CrowdSec HTTP scenarios, minimal attack surface Single-user lab, limited blast radius
R-30 Google Drive sync exposes encrypted secrets Low Very Low SOPS files visible on GDrive Secrets are SOPS-encrypted (ciphertext only), GDrive access requires Google auth Encrypted at rest, no cleartext exposure

Risk Matrix

Likelihood →   Very Low    Low         Medium      High
Impact ↓
──────────────────────────────────────────────────────────
Critical        R-21,R-22                           
High            R-25        R-27,R-29               R-20
Medium          R-28,R-24   R-23                    
Low             R-30        R-26                    

Accepted risks

R-20 (Hub total loss) and R-21 (age key loss) are the two highest-impact risks. Both are mitigated but not eliminated. The residual risk is accepted for a personal R&D lab. In a production Defence environment, these would require additional controls (HA cluster, HSM-backed keys, multi-region deployment).

Risk Review Schedule

Review Frequency Actions
Monthly After each Beast budget review Check R-04 cost controls, review alert noise (R-07)
Quarterly Every 3 months Review remaining risks, check for new CVEs (R-27, R-29), rotate keys (R-25)
After incident As needed Add new risk entries, update mitigations
After architecture change As needed Re-evaluate all risks against new topology