stuff
This commit is contained in:
parent
c8754e1bdc
commit
fbbeb59c0e
28 changed files with 907 additions and 995 deletions
|
|
@ -89,20 +89,19 @@ Note that, by applying these playbooks, both the root user and the `counterweigh
|
|||
* Verify the changes are working correctly
|
||||
* After running this playbook, clear your browser cache or perform a hard reload (Ctrl+Shift+R) before using the Proxmox VE Web UI to avoid UI display issues.
|
||||
|
||||
### Deploy CPU Temperature Monitoring
|
||||
### Deploy Infra Monitoring (Disk, Health, CPU Temp)
|
||||
|
||||
* The nodito server can be configured with CPU temperature monitoring that sends alerts to Uptime Kuma when temperatures exceed a threshold.
|
||||
* Before running the CPU temperature monitoring playbook, you need to create a secrets file with your Uptime Kuma push URL:
|
||||
* Create `ansible/infra/nodito/nodito_secrets.yml` with:
|
||||
```yaml
|
||||
uptime_kuma_url: "https://your-uptime-kuma.com/api/push/your-push-key"
|
||||
```
|
||||
* Run the CPU temperature monitoring setup with: `ansible-playbook -i inventory.ini infra/nodito/40_cpu_temp_alerts.yml`
|
||||
* This will:
|
||||
* Install required packages (lm-sensors, curl, jq, bc)
|
||||
* Create a monitoring script that checks CPU temperature every minute
|
||||
* Set up a systemd service and timer for automated monitoring
|
||||
* Send alerts to Uptime Kuma when temperature exceeds the threshold (default: 80°C)
|
||||
* Nodito can run the same monitoring stack used elsewhere: disk usage, heartbeat healthcheck, and CPU temperature alerts feeding Uptime Kuma.
|
||||
* Playbooks to run (in any order):
|
||||
* `ansible-playbook -i inventory.ini infra/410_disk_usage_alerts.yml`
|
||||
* `ansible-playbook -i inventory.ini infra/420_system_healthcheck.yml`
|
||||
* `ansible-playbook -i inventory.ini infra/430_cpu_temp_alerts.yml`
|
||||
* Each playbook automatically:
|
||||
* Creates/updates the corresponding monitor in Uptime Kuma (including ntfy notification wiring)
|
||||
* Installs any required packages (curl, lm-sensors, jq, bc, etc.)
|
||||
* Creates the monitoring script(s) and log files
|
||||
* Sets up systemd services and timers for automated runs
|
||||
* Sends alerts to Uptime Kuma when thresholds are exceeded or heartbeats stop
|
||||
|
||||
### Setup ZFS Storage Pool
|
||||
|
||||
|
|
@ -131,6 +130,26 @@ Note that, by applying these playbooks, both the root user and the `counterweigh
|
|||
* Enable ZFS services for automatic pool import on boot
|
||||
* **Warning**: This will destroy all data on the specified disks. Make sure you're using the correct disk IDs and that the disks don't contain important data.
|
||||
|
||||
### Build Debian Cloud Template for Proxmox
|
||||
|
||||
* After storage is ready, create a reusable Debian cloud template so future Proxmox VMs can be cloned in seconds.
|
||||
* Run: `ansible-playbook -i inventory.ini infra/nodito/33_proxmox_debian_cloud_template.yml`
|
||||
* This playbook:
|
||||
* Downloads the latest Debian generic cloud qcow2 image (override via `debian_cloud_image_url`/`debian_cloud_image_filename`)
|
||||
* Imports it into your Proxmox storage (defaults to the configured ZFS pool) and builds VMID `9001` as a template
|
||||
* Injects your SSH keys, enables qemu-guest-agent, configures DHCP networking, and sizes the disk (default 10 GB)
|
||||
* Drops a cloud-init snippet so clones automatically install qemu-guest-agent and can run upgrades on first boot
|
||||
* Once it finishes, provision new machines with `qm clone 9001 <vmid> --name <vmname>` plus your usual cloud-init overrides.
|
||||
|
||||
### Provision VMs with OpenTofu
|
||||
|
||||
* Prefer a declarative workflow? The `tofu/nodito` project clones VM definitions from the template automatically.
|
||||
* Quick start (see `tofu/nodito/README.md` for full details):
|
||||
1. Install OpenTofu, copy `terraform.tfvars.example` to `terraform.tfvars`, and fill in the Proxmox API URL/token plus your SSH public key.
|
||||
2. Define VMs in the `vms` map (name, cores, memory, disk size, `ipconfig0`, optional `vlan_tag`). Disks default to the `proxmox-tank-1` ZFS pool.
|
||||
3. Run `tofu init`, `tofu plan -var-file=terraform.tfvars`, and `tofu apply -var-file=terraform.tfvars`.
|
||||
* Each VM is cloned from the `debian-13-cloud-init` template (VMID 9001), attaches to `vmbr0`, and boots with qemu-guest-agent + your keys injected via cloud-init. Updates to the tfvars map let you grow/shrink the fleet with a single `tofu apply`.
|
||||
|
||||
## General prep for all machines
|
||||
|
||||
### Set up Infrastructure Secrets
|
||||
|
|
@ -146,32 +165,6 @@ Note that, by applying these playbooks, both the root user and the `counterweigh
|
|||
```
|
||||
* **Important**: Never commit this file to version control (it's in `.gitignore`)
|
||||
|
||||
### Deploy Disk Usage Monitoring
|
||||
|
||||
* Any machine can be configured with disk usage monitoring that sends alerts to Uptime Kuma when disk usage exceeds a threshold.
|
||||
* This playbook automatically creates an Uptime Kuma push monitor for each host (idempotent - won't create duplicates).
|
||||
* Prerequisites:
|
||||
* Install the Uptime Kuma Ansible collection: `ansible-galaxy collection install -r ansible/requirements.yml`
|
||||
* Install Python dependencies: `pip install -r requirements.txt` (includes uptime-kuma-api)
|
||||
* Set up `ansible/infra_secrets.yml` with your Uptime Kuma API token (see above)
|
||||
* Uptime Kuma must be deployed (the playbook automatically uses the URL from `uptime_kuma_vars.yml`)
|
||||
* Run the disk monitoring setup with:
|
||||
```bash
|
||||
ansible-playbook -i inventory.ini infra/410_disk_usage_alerts.yml
|
||||
```
|
||||
* This will:
|
||||
* Create an Uptime Kuma monitor group per host named "{hostname} - infra" (idempotent)
|
||||
* Create a push monitor in Uptime Kuma with "upside down" mode (no news is good news)
|
||||
* Assign the monitor to the host's group for better organization
|
||||
* Install required packages (curl, bc)
|
||||
* Create a monitoring script that checks disk usage at configured intervals (default: 15 minutes)
|
||||
* Set up a systemd service and timer for automated monitoring
|
||||
* Send alerts to Uptime Kuma only when usage exceeds threshold (default: 80%)
|
||||
* Optional configuration:
|
||||
* Change threshold: `-e "disk_usage_threshold_percent=85"`
|
||||
* Change check interval: `-e "disk_check_interval_minutes=10"`
|
||||
* Monitor different mount point: `-e "monitored_mount_point=/home"`
|
||||
|
||||
## GPG Keys
|
||||
|
||||
Some of the backups are stored encrypted for security. To allow this, fill in the gpg variables listed in `example.inventory.ini` under the `lapy` block.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue