personal_infra/01_infra_setup.md
2025-12-01 11:17:02 +01:00

174 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 01 Infra Setup
This describes how to prepare each machine before deploying services on them.
## First steps
* Create an ssh key or pick an existing one. We'll refer to it as the `personal_ssh_key`.
* Deploy ansible on the laptop (Lapy), which will act as the ansible control node. To do so:
* Create a `venv`: `python3 -m venv venv`
* Activate it: `source venv/bin/activate`
* Install the listed ansible requirements with `pip install -r requirements.txt`
* Keep in mind you should activate this `venv` from now on when running `ansible` commands.
## Domain
* Some services are designed to be accessible through WAN through a friendly URL.
* You'll need to have a domain where you can set DNS records and have the ability to create different subdomains, as the guide assumes each service will get its own subdomain.
* Getting and configuring the domain is outside the scope of this repo. Whenever a service needs you to set up a subdomain, it will be mentioned explictly.
* You should add the domain to the var `root_domain` in `ansible/infra_vars.yml`.
## Prepare the VPSs (vipy, watchtower and spacey)
### Source the VPSs
* The guide is agnostic to which provider you pick, but has been tested with VMs from https://99stack.com and contains some operations that are specifically relevant to their VPSs.
* The expectations are that the VPS ticks the following boxes:
+ Runs Debian 12/13 bookworm.
+ Has a public IP4 and starts out with SSH listening on port 22.
+ Boots with one of your SSH keys already authorized. If this is not the case, you'll have to manually drop the pubkey there before using the playbooks.
* You will need three VPSs:
+ One to host most services,
+ Another tiny one to monitor Uptime. We use a different one to prevent the monitoring service from falling down with the main machine.
+ A final one to run the headscale server, since the main VPS needs to be part of the mesh network and can't do so while also running the coordination server.
* Move on once your VPSs are running and satisfies the prerequisites.
### Prepare Ansible vars
* You have an example `ansible/example.inventory.ini`. Copy it with `cp ansible/example.inventory.ini ansible/inventory.ini` and fill in the `[vps]` group with host entries for each machine (`vipy` for services, `watchtower` for uptime monitoring, `spacey` for headscale).
* A few notes:
* The guides assume you'll only have one `vipy` host entry. Stuff will break if you have multiple, so avoid that.
### Create user and secure VPS access
* Ansible will create a user on the first playbook `01_basic_vps_setup_playbook.yml`. This is the user that will get used regularly. But, since this user doesn't exist, you obviosuly need to first run this playbook from some other user. We assume your VPS provider has given you a root user, which is what you need to define as the running user in the next command.
* cd into `ansible`
* Run `ansible-playbook -i inventory.ini infra/01_user_and_access_setup_playbook.yml -e 'ansible_user="your root user here"'`
* Then, configure firewall access, fail2ban and auditd with `ansible-playbook -i inventory.ini infra/02_firewall_and_fail2ban_playbook.yml`. Since the user we will use is now present, there is no need to specify the user anymore.
Note that, by applying these playbooks, both the root user and the `counterweight` user will use the same SSH pubkey for auth.
Checklist:
- [ ] All 3 VPS are accessible with the `counterweight` user
- [ ] All 3 VPS have UFW up and running
## Prepare Nodito Server
### Source the Nodito Server
* This setup is designed for a local Nodito server running in your home environment.
* The expectations are that the Nodito server:
+ Runs Proxmox VE (based on Debian).
+ Has a predictable local IP address.
+ Has root user with password authentication enabled (default Proxmox state).
+ SSH is accessible on port 22.
### Prepare Ansible vars for Nodito
* Ensure your inventory contains a `[nodito_host]` group and the `nodito` host entry (copy the example inventory if needed) and fill in with values.
### Bootstrap SSH Key Access and Create User
* Nodito starts with password authentication enabled and no SSH keys configured. We need to bootstrap SSH key access first.
* Run the complete setup with: `ansible-playbook -i inventory.ini infra/nodito/30_proxmox_bootstrap_playbook.yml -e 'ansible_user=root'`
* This single playbook will:
* Set up SSH key access for root
* Create the counterweight user with SSH keys
* Update and secure the system
* Disable root login and password authentication
* Test the final configuration
* For all future playbooks targeting nodito, use the default configuration (no overrides needed).
Note that, by applying these playbooks, both the root user and the `counterweight` user will use the same SSH pubkey for auth, but root login will be disabled.
### Switch to Community Repositories
* Proxmox VE installations typically come with enterprise repositories enabled, which require a subscription. To avoid subscription warnings and use the community repositories instead:
* Run the repository switch with: `ansible-playbook -i inventory.ini infra/nodito/32_proxmox_community_repos_playbook.yml`
* This playbook will:
* Detect whether your Proxmox installation uses modern deb822 format (Proxmox VE 9) or legacy format (Proxmox VE 8)
* Remove enterprise repository files and create community repository files
* Disable subscription nag messages in both web and mobile interfaces
* Update Proxmox packages from the community repository
* Verify the changes are working correctly
* After running this playbook, clear your browser cache or perform a hard reload (Ctrl+Shift+R) before using the Proxmox VE Web UI to avoid UI display issues.
### Deploy Infra Monitoring (Disk, Health, CPU Temp)
* Nodito can run the same monitoring stack used elsewhere: disk usage, heartbeat healthcheck, and CPU temperature alerts feeding Uptime Kuma.
* Playbooks to run (in any order):
* `ansible-playbook -i inventory.ini infra/410_disk_usage_alerts.yml`
* `ansible-playbook -i inventory.ini infra/420_system_healthcheck.yml`
* `ansible-playbook -i inventory.ini infra/430_cpu_temp_alerts.yml`
* Each playbook automatically:
* Creates/updates the corresponding monitor in Uptime Kuma (including ntfy notification wiring)
* Installs any required packages (curl, lm-sensors, jq, bc, etc.)
* Creates the monitoring script(s) and log files
* Sets up systemd services and timers for automated runs
* Sends alerts to Uptime Kuma when thresholds are exceeded or heartbeats stop
### Setup ZFS Storage Pool
* The nodito server can be configured with a ZFS RAID 1 storage pool for Proxmox VM storage, providing redundancy and data integrity.
* Before running the ZFS pool setup playbook, you need to identify your disk IDs and configure them in the variables file:
* SSH into your nodito server and run: `ls -la /dev/disk/by-id/ | grep -E "(ata-|scsi-|nvme-)"`
* This will show you the persistent disk identifiers for all your disks. Look for the two disks you want to use for the ZFS pool.
* Example output:
```
lrwxrwxrwx 1 root root 9 Dec 15 10:30 ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1234567 -> ../../sdb
lrwxrwxrwx 1 root root 9 Dec 15 10:30 ata-WDC_WD40EFRX-68N32N0_WD-WCC7K7654321 -> ../../sdc
```
* Update `ansible/infra/nodito/nodito_vars.yml` with your actual disk IDs:
```yaml
zfs_disk_1: "/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1234567"
zfs_disk_2: "/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K7654321"
```
* Run the ZFS pool setup with: `ansible-playbook -i inventory.ini infra/nodito/32_zfs_pool_setup_playbook.yml`
* This will:
* Validate Proxmox VE and ZFS installation
* Install ZFS utilities and kernel modules
* Create a RAID 1 (mirror) ZFS pool named `proxmox-storage` with optimized settings
* Configure ZFS pool properties (ashift=12, compression=lz4, atime=off, etc.)
* Export and re-import the pool for Proxmox compatibility
* Configure Proxmox to use the ZFS pool storage (zfspool type)
* Enable ZFS services for automatic pool import on boot
* **Warning**: This will destroy all data on the specified disks. Make sure you're using the correct disk IDs and that the disks don't contain important data.
### Build Debian Cloud Template for Proxmox
* After storage is ready, create a reusable Debian cloud template so future Proxmox VMs can be cloned in seconds.
* Run: `ansible-playbook -i inventory.ini infra/nodito/33_proxmox_debian_cloud_template.yml`
* This playbook:
* Downloads the latest Debian generic cloud qcow2 image (override via `debian_cloud_image_url`/`debian_cloud_image_filename`)
* Imports it into your Proxmox storage (defaults to the configured ZFS pool) and builds VMID `9001` as a template
* Injects your SSH keys, enables qemu-guest-agent, configures DHCP networking, and sizes the disk (default 10GB)
* Drops a cloud-init snippet so clones automatically install qemu-guest-agent and can run upgrades on first boot
* Once it finishes, provision new machines with `qm clone 9001 <vmid> --name <vmname>` plus your usual cloud-init overrides.
### Provision VMs with OpenTofu
* Prefer a declarative workflow? The `tofu/nodito` project clones VM definitions from the template automatically.
* Quick start (see `tofu/nodito/README.md` for full details):
1. Install OpenTofu, copy `terraform.tfvars.example` to `terraform.tfvars`, and fill in the Proxmox API URL/token plus your SSH public key.
2. Define VMs in the `vms` map (name, cores, memory, disk size, `ipconfig0`, optional `vlan_tag`). Disks default to the `proxmox-tank-1` ZFS pool.
3. Run `tofu init`, `tofu plan -var-file=terraform.tfvars`, and `tofu apply -var-file=terraform.tfvars`.
* Each VM is cloned from the `debian-13-cloud-init` template (VMID 9001), attaches to `vmbr0`, and boots with qemu-guest-agent + your keys injected via cloud-init. Updates to the tfvars map let you grow/shrink the fleet with a single `tofu apply`.
## General prep for all machines
### Set up Infrastructure Secrets
* Create `ansible/infra_secrets.yml` based on the example file:
```bash
cp ansible/infra_secrets.yml.example ansible/infra_secrets.yml
```
* Edit `ansible/infra_secrets.yml` and add your Uptime Kuma credentials:
```yaml
uptime_kuma_username: "admin"
uptime_kuma_password: "your_password"
```
* **Important**: Never commit this file to version control (it's in `.gitignore`)
## GPG Keys
Some of the backups are stored encrypted for security. To allow this, fill in the gpg variables listed in `example.inventory.ini` under the `lapy` block.