This commit is contained in:
counterweight 2025-11-14 23:36:00 +01:00
parent c8754e1bdc
commit fbbeb59c0e
Signed by: counterweight
GPG key ID: 883EDBAA726BD96C
28 changed files with 907 additions and 995 deletions

View file

@ -89,20 +89,19 @@ Note that, by applying these playbooks, both the root user and the `counterweigh
* Verify the changes are working correctly
* After running this playbook, clear your browser cache or perform a hard reload (Ctrl+Shift+R) before using the Proxmox VE Web UI to avoid UI display issues.
### Deploy CPU Temperature Monitoring
### Deploy Infra Monitoring (Disk, Health, CPU Temp)
* The nodito server can be configured with CPU temperature monitoring that sends alerts to Uptime Kuma when temperatures exceed a threshold.
* Before running the CPU temperature monitoring playbook, you need to create a secrets file with your Uptime Kuma push URL:
* Create `ansible/infra/nodito/nodito_secrets.yml` with:
```yaml
uptime_kuma_url: "https://your-uptime-kuma.com/api/push/your-push-key"
```
* Run the CPU temperature monitoring setup with: `ansible-playbook -i inventory.ini infra/nodito/40_cpu_temp_alerts.yml`
* This will:
* Install required packages (lm-sensors, curl, jq, bc)
* Create a monitoring script that checks CPU temperature every minute
* Set up a systemd service and timer for automated monitoring
* Send alerts to Uptime Kuma when temperature exceeds the threshold (default: 80°C)
* Nodito can run the same monitoring stack used elsewhere: disk usage, heartbeat healthcheck, and CPU temperature alerts feeding Uptime Kuma.
* Playbooks to run (in any order):
* `ansible-playbook -i inventory.ini infra/410_disk_usage_alerts.yml`
* `ansible-playbook -i inventory.ini infra/420_system_healthcheck.yml`
* `ansible-playbook -i inventory.ini infra/430_cpu_temp_alerts.yml`
* Each playbook automatically:
* Creates/updates the corresponding monitor in Uptime Kuma (including ntfy notification wiring)
* Installs any required packages (curl, lm-sensors, jq, bc, etc.)
* Creates the monitoring script(s) and log files
* Sets up systemd services and timers for automated runs
* Sends alerts to Uptime Kuma when thresholds are exceeded or heartbeats stop
### Setup ZFS Storage Pool
@ -131,6 +130,26 @@ Note that, by applying these playbooks, both the root user and the `counterweigh
* Enable ZFS services for automatic pool import on boot
* **Warning**: This will destroy all data on the specified disks. Make sure you're using the correct disk IDs and that the disks don't contain important data.
### Build Debian Cloud Template for Proxmox
* After storage is ready, create a reusable Debian cloud template so future Proxmox VMs can be cloned in seconds.
* Run: `ansible-playbook -i inventory.ini infra/nodito/33_proxmox_debian_cloud_template.yml`
* This playbook:
* Downloads the latest Debian generic cloud qcow2 image (override via `debian_cloud_image_url`/`debian_cloud_image_filename`)
* Imports it into your Proxmox storage (defaults to the configured ZFS pool) and builds VMID `9001` as a template
* Injects your SSH keys, enables qemu-guest-agent, configures DHCP networking, and sizes the disk (default 10GB)
* Drops a cloud-init snippet so clones automatically install qemu-guest-agent and can run upgrades on first boot
* Once it finishes, provision new machines with `qm clone 9001 <vmid> --name <vmname>` plus your usual cloud-init overrides.
### Provision VMs with OpenTofu
* Prefer a declarative workflow? The `tofu/nodito` project clones VM definitions from the template automatically.
* Quick start (see `tofu/nodito/README.md` for full details):
1. Install OpenTofu, copy `terraform.tfvars.example` to `terraform.tfvars`, and fill in the Proxmox API URL/token plus your SSH public key.
2. Define VMs in the `vms` map (name, cores, memory, disk size, `ipconfig0`, optional `vlan_tag`). Disks default to the `proxmox-tank-1` ZFS pool.
3. Run `tofu init`, `tofu plan -var-file=terraform.tfvars`, and `tofu apply -var-file=terraform.tfvars`.
* Each VM is cloned from the `debian-13-cloud-init` template (VMID 9001), attaches to `vmbr0`, and boots with qemu-guest-agent + your keys injected via cloud-init. Updates to the tfvars map let you grow/shrink the fleet with a single `tofu apply`.
## General prep for all machines
### Set up Infrastructure Secrets
@ -146,32 +165,6 @@ Note that, by applying these playbooks, both the root user and the `counterweigh
```
* **Important**: Never commit this file to version control (it's in `.gitignore`)
### Deploy Disk Usage Monitoring
* Any machine can be configured with disk usage monitoring that sends alerts to Uptime Kuma when disk usage exceeds a threshold.
* This playbook automatically creates an Uptime Kuma push monitor for each host (idempotent - won't create duplicates).
* Prerequisites:
* Install the Uptime Kuma Ansible collection: `ansible-galaxy collection install -r ansible/requirements.yml`
* Install Python dependencies: `pip install -r requirements.txt` (includes uptime-kuma-api)
* Set up `ansible/infra_secrets.yml` with your Uptime Kuma API token (see above)
* Uptime Kuma must be deployed (the playbook automatically uses the URL from `uptime_kuma_vars.yml`)
* Run the disk monitoring setup with:
```bash
ansible-playbook -i inventory.ini infra/410_disk_usage_alerts.yml
```
* This will:
* Create an Uptime Kuma monitor group per host named "{hostname} - infra" (idempotent)
* Create a push monitor in Uptime Kuma with "upside down" mode (no news is good news)
* Assign the monitor to the host's group for better organization
* Install required packages (curl, bc)
* Create a monitoring script that checks disk usage at configured intervals (default: 15 minutes)
* Set up a systemd service and timer for automated monitoring
* Send alerts to Uptime Kuma only when usage exceeds threshold (default: 80%)
* Optional configuration:
* Change threshold: `-e "disk_usage_threshold_percent=85"`
* Change check interval: `-e "disk_check_interval_minutes=10"`
* Monitor different mount point: `-e "monitored_mount_point=/home"`
## GPG Keys
Some of the backups are stored encrypted for security. To allow this, fill in the gpg variables listed in `example.inventory.ini` under the `lapy` block.