Logging¶
Stack Overview¶
flowchart LR
subgraph Sources["Log Sources"]
S1["System journals"]
S2["K3s pods"]
S3["Caddy access logs"]
S4["CrowdSec"]
S5["K3s audit log"]
end
subgraph Collection["Collection"]
Alloy1["Alloy<br/>(Hub)"]
Alloy2["Alloy<br/>(DMZ)"]
Alloy3["Alloy<br/>(Beast)"]
end
subgraph Storage["Storage (Hub)"]
Loki["Loki<br/>(monolithic)"]
end
S1 --> Alloy1 & Alloy2 & Alloy3
S2 --> Alloy1 & Alloy2 & Alloy3
S3 --> Alloy2
S4 --> Alloy1 & Alloy2
S5 --> Alloy1
Alloy1 -->|"push"| Loki
Alloy2 -->|"push"| Loki
Alloy3 -->|"push"| Loki
Loki --> Grafana["Grafana<br/>(query)"]
style Loki fill:#1a5276,stroke:#2980b9,color:#fff
style Grafana fill:#7d6608,stroke:#f1c40f,color:#fff
Loki (Monolithic Mode)¶
Loki runs in single-binary monolithic mode on the Hub node. It ingests, stores, and serves log queries.
| Parameter | Value |
|---|---|
| Mode | Monolithic (single binary) |
| Storage | Local filesystem (/var/lib/loki/) |
| Retention | 14 days |
| Listen port | 3100 |
| Max label names | 30 |
| Max label value length | 1024 |
| Ingestion rate limit | 10 MB/s |
Loki Configuration¶
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
retention_period: 336h # 14 days
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
compactor:
working_directory: /var/lib/loki/compactor
retention_enabled: true
Why monolithic?
A 3-VM personal lab does not need Loki's microservice mode. Monolithic mode uses a single binary and local filesystem storage, minimizing resource usage and operational complexity.
Grafana Alloy (Log Collector)¶
Alloy (formerly Grafana Agent) runs as a DaemonSet, collecting logs from each node and pushing to Loki.
Alloy Configuration¶
// Journal log source
loki.source.journal "system" {
forward_to = [loki.write.default.receiver]
labels = {
job = "systemd-journal",
host = env("HOSTNAME"),
}
}
// Kubernetes pod logs
loki.source.kubernetes "pods" {
forward_to = [loki.process.default.receiver]
targets = discovery.kubernetes.pods.targets
}
// Processing pipeline
loki.process "default" {
forward_to = [loki.write.default.receiver]
stage.label_drop {
values = ["filename"]
}
}
// Push to Loki
loki.write "default" {
endpoint {
url = "http://loki.monitoring:3100/loki/api/v1/push"
}
}
Log Sources per VM¶
Hub (CX32)¶
| Source | Type | Label | Contents |
|---|---|---|---|
| systemd journal | Journal | job=systemd-journal |
OS, K3s server, WireGuard, CrowdSec |
| K3s pods | Kubernetes | namespace=* |
All Hub-scheduled pod logs |
| K3s audit log | File | job=k3s-audit |
API server audit events |
| CrowdSec decisions | Journal | unit=crowdsec |
Ban/unban events |
| VictoriaMetrics | Kubernetes | app=victoriametrics |
Scrape errors, ingestion stats |
DMZ (CX22)¶
| Source | Type | Label | Contents |
|---|---|---|---|
| systemd journal | Journal | job=systemd-journal |
OS, K3s agent, CrowdSec |
| K3s pods | Kubernetes | namespace=ingress |
Caddy, Authelia, ttyd logs |
| Caddy access log | Kubernetes | app=caddy |
HTTP access logs (structured JSON) |
| Authelia auth events | Kubernetes | app=authelia |
Login attempts, TOTP validations |
| CrowdSec decisions | Journal | unit=crowdsec |
Ban/unban events |
Beast (CAX31)¶
| Source | Type | Label | Contents |
|---|---|---|---|
| systemd journal | Journal | job=systemd-journal |
OS, K3s agent |
| K3s pods | Kubernetes | namespace=dev |
Dev workload logs |
Beast log retention
Beast logs are pushed to Loki on Hub. When Beast is destroyed, the Alloy instance stops pushing, but historical logs remain in Loki until the 14-day retention expires.
Retention Configuration¶
| Data Type | Retention | Rationale |
|---|---|---|
| System logs | 14 days | Sufficient for troubleshooting |
| Pod logs | 14 days | Matches Loki retention |
| Caddy access logs | 14 days | Security review window |
| K3s audit logs | 30 days | Longer retention for security audit (file-based, not Loki) |
| CrowdSec decisions | 14 days in Loki, permanent in CrowdSec DB | CrowdSec maintains its own decision history |
Disk Usage Estimate¶
| Volume | Daily Ingestion | 14-Day Storage |
|---|---|---|
| System journals (3 nodes) | ~50 MB | ~700 MB |
| Pod logs | ~30 MB | ~420 MB |
| Caddy access logs | ~10 MB | ~140 MB |
| K3s audit log | ~20 MB | ~280 MB (30d on disk) |
| Total | ~110 MB/day | ~1.5 GB |
Disk is not a concern
At ~1.5 GB for 14 days of logs, Loki uses less than 2% of the Hub's 80 GB disk. Compression (Loki uses Snappy) further reduces actual on-disk size.