diff --git a/ups/README.md b/ups/README.md new file mode 100644 index 0000000..ea5986d --- /dev/null +++ b/ups/README.md @@ -0,0 +1,9 @@ +# UPS Setup + +CyberPower CP900EPFCLCD (900VA/540W) powering Nodito and router. + +## Contents + +- `ups.md` — Main documentation: setup checklist, drills, Q&A +- `nut-setup.yml` — Ansible playbook for NUT configuration +- `ups-shutdown-drill*.log` — Logs from shutdown drill runs (Jan 2026) diff --git a/ups/nut-setup.yml b/ups/nut-setup.yml new file mode 100644 index 0000000..048a874 --- /dev/null +++ b/ups/nut-setup.yml @@ -0,0 +1,304 @@ +--- +# NUT (Network UPS Tools) Setup Playbook +# Run from laptop: ansible-playbook -i inventory nut-setup.yml +# +# Prerequisites: +# - UPS physically connected to server via USB +# - SSH access to target server +# +# Variables to customize: +# - ups_name: logical name for the UPS in NUT +# - ups_password: password for upsmon user +# - uptime_kuma_push_url: your Uptime Kuma push monitor URL + +- name: Setup NUT for CyberPower UPS + hosts: nodito + become: true + vars: + ups_name: cyberpower + ups_desc: "CyberPower CP900EPFCLCD" + ups_driver: usbhid-ups + ups_port: auto + ups_user: counterweight + ups_password: "changeme" # TODO: use ansible-vault in production + ups_offdelay: 120 # Seconds after shutdown command before UPS cuts outlet power + ups_ondelay: 30 # Seconds after mains returns before UPS restores outlet power + # Note: Shutdown threshold is controlled by UPS's battery.runtime.low (default 300s = 5 min) + uptime_kuma_push_url: "https://uptime.example.com/api/push/xxxxx" + + tasks: + # ------------------------------------------------------------------ + # Installation + # ------------------------------------------------------------------ + - name: Install NUT packages + ansible.builtin.apt: + name: + - nut + - nut-client + - nut-server + state: present + update_cache: true + + # ------------------------------------------------------------------ + # Verify UPS is detected (informational) + # ------------------------------------------------------------------ + - name: Check if UPS is detected via USB + ansible.builtin.shell: lsusb | grep -i cyber + register: lsusb_output + changed_when: false + failed_when: false + + - name: Display USB detection result + ansible.builtin.debug: + msg: "{{ lsusb_output.stdout | default('UPS not detected via USB - ensure it is plugged in') }}" + + - name: Reload udev rules (NUT installs rules but they need triggering for already-plugged devices) + ansible.builtin.shell: | + udevadm control --reload-rules + udevadm trigger --subsystem-match=usb --action=add + changed_when: true + + - name: Verify USB device has nut group permissions + ansible.builtin.shell: | + # Find the UPS device and check its permissions + BUS_DEV=$(lsusb | grep -i cyber | grep -oP 'Bus \K\d+|Device \K\d+' | tr '\n' '/' | sed 's/\/$//') + if [ -n "$BUS_DEV" ]; then + BUS=$(echo $BUS_DEV | cut -d'/' -f1) + DEV=$(echo $BUS_DEV | cut -d'/' -f2) + ls -la /dev/bus/usb/$BUS/$DEV + else + echo "UPS device not found" + exit 1 + fi + register: usb_permissions + changed_when: false + + - name: Display USB permissions + ansible.builtin.debug: + msg: "{{ usb_permissions.stdout }} — should show 'root nut', not 'root root'" + + - name: Scan for UPS with nut-scanner + ansible.builtin.command: nut-scanner -U + register: nut_scanner_output + changed_when: false + failed_when: false + + - name: Display nut-scanner result + ansible.builtin.debug: + msg: "{{ nut_scanner_output.stdout_lines }}" + + # ------------------------------------------------------------------ + # Configuration files + # ------------------------------------------------------------------ + - name: Configure NUT mode (standalone) + ansible.builtin.copy: + dest: /etc/nut/nut.conf + content: | + # Managed by Ansible + MODE=standalone + owner: root + group: nut + mode: "0640" + notify: Restart NUT services + + - name: Configure UPS device + ansible.builtin.copy: + dest: /etc/nut/ups.conf + content: | + # Managed by Ansible + [{{ ups_name }}] + driver = {{ ups_driver }} + port = {{ ups_port }} + desc = "{{ ups_desc }}" + offdelay = {{ ups_offdelay }} + ondelay = {{ ups_ondelay }} + owner: root + group: nut + mode: "0640" + notify: Restart NUT services + + - name: Configure upsd to listen on localhost + ansible.builtin.copy: + dest: /etc/nut/upsd.conf + content: | + # Managed by Ansible + LISTEN 127.0.0.1 3493 + owner: root + group: nut + mode: "0640" + notify: Restart NUT services + + - name: Configure upsd users + ansible.builtin.copy: + dest: /etc/nut/upsd.users + content: | + # Managed by Ansible + [{{ ups_user }}] + password = {{ ups_password }} + upsmon master + owner: root + group: nut + mode: "0640" + notify: Restart NUT services + + - name: Configure upsmon + ansible.builtin.copy: + dest: /etc/nut/upsmon.conf + content: | + # Managed by Ansible + MONITOR {{ ups_name }}@localhost 1 {{ ups_user }} {{ ups_password }} master + + MINSUPPLIES 1 + SHUTDOWNCMD "/sbin/shutdown -h +0" + POLLFREQ 5 + POLLFREQALERT 5 + HOSTSYNC 15 + DEADTIME 15 + POWERDOWNFLAG /etc/killpower + + # Notifications + NOTIFYMSG ONLINE "UPS %s on line power" + NOTIFYMSG ONBATT "UPS %s on battery" + NOTIFYMSG LOWBATT "UPS %s battery is low" + NOTIFYMSG FSD "UPS %s: forced shutdown in progress" + NOTIFYMSG COMMOK "Communications with UPS %s established" + NOTIFYMSG COMMBAD "Communications with UPS %s lost" + NOTIFYMSG SHUTDOWN "Auto logout and shutdown proceeding" + NOTIFYMSG REPLBATT "UPS %s battery needs replacing" + + # Log all events to syslog (upsmon handles LB shutdown automatically) + NOTIFYFLAG ONLINE SYSLOG + NOTIFYFLAG ONBATT SYSLOG + NOTIFYFLAG LOWBATT SYSLOG + NOTIFYFLAG FSD SYSLOG + NOTIFYFLAG COMMOK SYSLOG + NOTIFYFLAG COMMBAD SYSLOG + NOTIFYFLAG SHUTDOWN SYSLOG + NOTIFYFLAG REPLBATT SYSLOG + owner: root + group: nut + mode: "0640" + notify: Restart NUT services + + # NOTE: No upssched configuration needed. The UPS sets LB (Low Battery) flag + # when runtime < battery.runtime.low (default 300s) or charge < battery.charge.low (default 10%). + # upsmon handles LB automatically and triggers shutdown—no custom scripts required. + + # NOTE: /lib/systemd/system-shutdown/nutshutdown is provided by the NUT package. + # It already checks for the killpower flag and runs `upsdrvctl shutdown` to tell + # the UPS to cut outlet power. No need to create or modify it. + + - name: Verify late-stage shutdown script exists + ansible.builtin.stat: + path: /lib/systemd/system-shutdown/nutshutdown + register: nutshutdown_script + + - name: Warn if nutshutdown script is missing + ansible.builtin.debug: + msg: "WARNING: /lib/systemd/system-shutdown/nutshutdown not found. UPS may not cut power after shutdown, breaking auto-restart." + when: not nutshutdown_script.stat.exists + + # ------------------------------------------------------------------ + # Services + # Note: nut-driver-enumerator reads ups.conf and starts drivers via nut-driver@.service + # ------------------------------------------------------------------ + - name: Enable and start NUT driver enumerator + ansible.builtin.systemd: + name: nut-driver-enumerator + enabled: true + state: started + + - name: Enable and start NUT server + ansible.builtin.systemd: + name: nut-server + enabled: true + state: started + + - name: Enable and start NUT monitor + ansible.builtin.systemd: + name: nut-monitor + enabled: true + state: started + + # ------------------------------------------------------------------ + # Uptime Kuma heartbeat monitoring + # ------------------------------------------------------------------ + - name: Create UPS heartbeat script + ansible.builtin.copy: + dest: /usr/local/bin/ups-heartbeat.sh + content: | + #!/bin/bash + # UPS heartbeat for Uptime Kuma - Managed by Ansible + STATUS=$(upsc {{ ups_name }}@localhost ups.status 2>/dev/null) + + if [[ -z "$STATUS" ]]; then + # Cannot communicate with UPS + curl -fsS "{{ uptime_kuma_push_url }}?status=down&msg=UPS%20communication%20lost" > /dev/null 2>&1 + elif [[ "$STATUS" == *"OL"* ]]; then + # On line power + curl -fsS "{{ uptime_kuma_push_url }}?status=up&msg=UPS%20on%20mains" > /dev/null 2>&1 + else + # On battery or other state + curl -fsS "{{ uptime_kuma_push_url }}?status=down&msg=UPS%20on%20battery%20($STATUS)" > /dev/null 2>&1 + fi + owner: root + group: root + mode: "0755" + + - name: Setup cron job for UPS heartbeat + ansible.builtin.cron: + name: "UPS heartbeat to Uptime Kuma" + minute: "*" + job: "/usr/local/bin/ups-heartbeat.sh" + user: root + + # ------------------------------------------------------------------ + # Verification + # ------------------------------------------------------------------ + - name: Verify NUT can communicate with UPS + ansible.builtin.command: upsc {{ ups_name }}@localhost + register: upsc_output + changed_when: false + failed_when: false + + - name: Display UPS status + ansible.builtin.debug: + msg: "{{ upsc_output.stdout_lines }}" + + - name: Get UPS status summary + ansible.builtin.shell: | + echo "Status: $(upsc {{ ups_name }}@localhost ups.status 2>/dev/null)" + echo "Battery: $(upsc {{ ups_name }}@localhost battery.charge 2>/dev/null)%" + echo "Runtime: $(upsc {{ ups_name }}@localhost battery.runtime 2>/dev/null)s" + echo "Load: $(upsc {{ ups_name }}@localhost ups.load 2>/dev/null)%" + register: ups_summary + changed_when: false + + - name: Display UPS summary + ansible.builtin.debug: + msg: "{{ ups_summary.stdout_lines }}" + + - name: Verify low battery thresholds + ansible.builtin.shell: | + echo "Runtime threshold: $(upsc {{ ups_name }}@localhost battery.runtime.low 2>/dev/null)s" + echo "Charge threshold: $(upsc {{ ups_name }}@localhost battery.charge.low 2>/dev/null)%" + echo "LB triggers when runtime < threshold OR charge < threshold" + register: thresholds + changed_when: false + + - name: Display low battery thresholds + ansible.builtin.debug: + msg: "{{ thresholds.stdout_lines }}" + + # ------------------------------------------------------------------ + # Handlers + # ------------------------------------------------------------------ + handlers: + - name: Restart NUT services + ansible.builtin.systemd: + name: "{{ item }}" + state: restarted + loop: + - nut-driver-enumerator + - nut-server + - nut-monitor diff --git a/ups/ups-shutdown-drill-2.log b/ups/ups-shutdown-drill-2.log new file mode 100644 index 0000000..7f36953 --- /dev/null +++ b/ups/ups-shutdown-drill-2.log @@ -0,0 +1,7 @@ +Sun Jan 11 09:02:54 AM CET 2026: status=OL CHRG charge=40% runtime=730s +Sun Jan 11 09:03:04 AM CET 2026: status=OL CHRG charge=40% runtime=640s +Sun Jan 11 09:03:14 AM CET 2026: status=OL CHRG charge=40% runtime=780s +Sun Jan 11 09:03:24 AM CET 2026: status=OB DISCHRG charge=21% runtime=435s +Sun Jan 11 09:03:34 AM CET 2026: status=OB DISCHRG charge=15% runtime=262s +Sun Jan 11 09:03:44 AM CET 2026: status=FSD OB DISCHRG LB charge=14% runtime=227s +Connection to nodito closed by remote host. diff --git a/ups/ups-shutdown-drill.log b/ups/ups-shutdown-drill.log new file mode 100644 index 0000000..9ba4d04 --- /dev/null +++ b/ups/ups-shutdown-drill.log @@ -0,0 +1,185 @@ +Sun Jan 11 08:23:31 AM CET 2026: status=OL charge=100% runtime=2640s +Sun Jan 11 08:23:41 AM CET 2026: status=OL charge=100% runtime=2610s +Sun Jan 11 08:23:51 AM CET 2026: status=OL charge=100% runtime=2610s +Sun Jan 11 08:24:01 AM CET 2026: status=OL charge=100% runtime=2610s +Sun Jan 11 08:24:12 AM CET 2026: status=OL charge=100% runtime=2520s +Sun Jan 11 08:24:22 AM CET 2026: status=OL charge=100% runtime=2520s +Sun Jan 11 08:24:32 AM CET 2026: status=OL charge=100% runtime=2520s +Sun Jan 11 08:24:42 AM CET 2026: status=OL charge=100% runtime=2610s +Sun Jan 11 08:24:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2550s +Sun Jan 11 08:25:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2550s +Sun Jan 11 08:25:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2677s +Sun Jan 11 08:25:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2677s +Sun Jan 11 08:25:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2677s +Sun Jan 11 08:25:42 AM CET 2026: status=OB DISCHRG charge=100% runtime=2677s +Sun Jan 11 08:25:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2618s +Sun Jan 11 08:26:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2618s +Sun Jan 11 08:26:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2618s +Sun Jan 11 08:26:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2647s +Sun Jan 11 08:26:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2647s +Sun Jan 11 08:26:42 AM CET 2026: status=OB DISCHRG charge=100% runtime=2647s +Sun Jan 11 08:26:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2507s +Sun Jan 11 08:27:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2507s +Sun Jan 11 08:27:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2507s +Sun Jan 11 08:27:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2291s +Sun Jan 11 08:27:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2291s +Sun Jan 11 08:27:42 AM CET 2026: status=OB DISCHRG charge=100% runtime=2291s +Sun Jan 11 08:27:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2625s +Sun Jan 11 08:28:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2625s +Sun Jan 11 08:28:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2574s +Sun Jan 11 08:28:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2552s +Sun Jan 11 08:28:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2581s +Sun Jan 11 08:28:42 AM CET 2026: status=OB DISCHRG charge=100% runtime=2472s +Sun Jan 11 08:28:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2260s +Sun Jan 11 08:29:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2536s +Sun Jan 11 08:29:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2514s +Sun Jan 11 08:29:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2464s +Sun Jan 11 08:29:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2192s +Sun Jan 11 08:29:42 AM CET 2026: status=OB DISCHRG charge=100% runtime=2358s +Sun Jan 11 08:29:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2414s +Sun Jan 11 08:30:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2207s +Sun Jan 11 08:30:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2452s +Sun Jan 11 08:30:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2430s +Sun Jan 11 08:30:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2187s +Sun Jan 11 08:30:42 AM CET 2026: status=OB DISCHRG charge=100% runtime=2160s +Sun Jan 11 08:30:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2407s +Sun Jan 11 08:31:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2252s +Sun Jan 11 08:31:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2199s +Sun Jan 11 08:31:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2336s +Sun Jan 11 08:31:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2336s +Sun Jan 11 08:31:42 AM CET 2026: status=OB DISCHRG charge=100% runtime=2266s +Sun Jan 11 08:31:52 AM CET 2026: status=OB DISCHRG charge=100% runtime=2266s +Sun Jan 11 08:32:02 AM CET 2026: status=OB DISCHRG charge=100% runtime=2014s +Sun Jan 11 08:32:12 AM CET 2026: status=OB DISCHRG charge=100% runtime=2247s +Sun Jan 11 08:32:22 AM CET 2026: status=OB DISCHRG charge=100% runtime=2200s +Sun Jan 11 08:32:32 AM CET 2026: status=OB DISCHRG charge=100% runtime=2200s +Sun Jan 11 08:32:42 AM CET 2026: status=OB DISCHRG charge=99% runtime=2128s +Sun Jan 11 08:32:52 AM CET 2026: status=OB DISCHRG charge=98% runtime=2180s +Sun Jan 11 08:33:02 AM CET 2026: status=OB DISCHRG charge=98% runtime=1935s +Sun Jan 11 08:33:12 AM CET 2026: status=OB DISCHRG charge=97% runtime=2109s +Sun Jan 11 08:33:22 AM CET 2026: status=OB DISCHRG charge=97% runtime=2012s +Sun Jan 11 08:33:32 AM CET 2026: status=OB DISCHRG charge=96% runtime=2136s +Sun Jan 11 08:33:42 AM CET 2026: status=OB DISCHRG charge=96% runtime=2136s +Sun Jan 11 08:33:52 AM CET 2026: status=OB DISCHRG charge=95% runtime=2090s +Sun Jan 11 08:34:02 AM CET 2026: status=OB DISCHRG charge=94% runtime=2068s +Sun Jan 11 08:34:12 AM CET 2026: status=OB DISCHRG charge=94% runtime=2044s +Sun Jan 11 08:34:22 AM CET 2026: status=OB DISCHRG charge=92% runtime=2070s +Sun Jan 11 08:34:32 AM CET 2026: status=OB DISCHRG charge=92% runtime=2070s +Sun Jan 11 08:34:42 AM CET 2026: status=OB DISCHRG charge=92% runtime=1748s +Sun Jan 11 08:34:52 AM CET 2026: status=OB DISCHRG charge=90% runtime=2047s +Sun Jan 11 08:35:02 AM CET 2026: status=OB DISCHRG charge=90% runtime=1755s +Sun Jan 11 08:35:12 AM CET 2026: status=OB DISCHRG charge=88% runtime=1914s +Sun Jan 11 08:35:22 AM CET 2026: status=OB DISCHRG charge=88% runtime=1892s +Sun Jan 11 08:35:32 AM CET 2026: status=OB DISCHRG charge=87% runtime=1979s +Sun Jan 11 08:35:42 AM CET 2026: status=OB DISCHRG charge=87% runtime=1935s +Sun Jan 11 08:35:52 AM CET 2026: status=OB DISCHRG charge=86% runtime=1870s +Sun Jan 11 08:36:02 AM CET 2026: status=OB DISCHRG charge=84% runtime=1827s +Sun Jan 11 08:36:12 AM CET 2026: status=OB DISCHRG charge=85% runtime=1891s +Sun Jan 11 08:36:22 AM CET 2026: status=OB DISCHRG charge=83% runtime=1846s +Sun Jan 11 08:36:32 AM CET 2026: status=OB DISCHRG charge=83% runtime=1846s +Sun Jan 11 08:36:42 AM CET 2026: status=OB DISCHRG charge=82% runtime=1722s +Sun Jan 11 08:36:52 AM CET 2026: status=OB DISCHRG charge=81% runtime=1761s +Sun Jan 11 08:37:02 AM CET 2026: status=OB DISCHRG charge=81% runtime=1802s +Sun Jan 11 08:37:12 AM CET 2026: status=OB DISCHRG charge=80% runtime=1800s +Sun Jan 11 08:37:22 AM CET 2026: status=OB DISCHRG charge=79% runtime=1738s +Sun Jan 11 08:37:32 AM CET 2026: status=OB DISCHRG charge=79% runtime=1738s +Sun Jan 11 08:37:42 AM CET 2026: status=OB DISCHRG charge=78% runtime=1716s +Sun Jan 11 08:37:52 AM CET 2026: status=OB DISCHRG charge=77% runtime=1674s +Sun Jan 11 08:38:02 AM CET 2026: status=OB DISCHRG charge=76% runtime=1482s +Sun Jan 11 08:38:12 AM CET 2026: status=OB DISCHRG charge=76% runtime=1482s +Sun Jan 11 08:38:22 AM CET 2026: status=OB DISCHRG charge=74% runtime=1609s +Sun Jan 11 08:38:32 AM CET 2026: status=OB DISCHRG charge=75% runtime=1481s +Sun Jan 11 08:38:42 AM CET 2026: status=OB DISCHRG charge=74% runtime=1628s +Sun Jan 11 08:38:52 AM CET 2026: status=OB DISCHRG charge=71% runtime=1579s +Sun Jan 11 08:39:02 AM CET 2026: status=OB DISCHRG charge=71% runtime=1508s +Sun Jan 11 08:39:12 AM CET 2026: status=OB DISCHRG charge=71% runtime=1544s +Sun Jan 11 08:39:22 AM CET 2026: status=OB DISCHRG charge=70% runtime=1452s +Sun Jan 11 08:39:32 AM CET 2026: status=OB DISCHRG charge=68% runtime=1479s +Sun Jan 11 08:39:42 AM CET 2026: status=OB DISCHRG charge=69% runtime=1535s +Sun Jan 11 08:39:52 AM CET 2026: status=OB DISCHRG charge=67% runtime=1490s +Sun Jan 11 08:40:02 AM CET 2026: status=OB DISCHRG charge=67% runtime=1407s +Sun Jan 11 08:40:12 AM CET 2026: status=OB DISCHRG charge=66% runtime=1435s +Sun Jan 11 08:40:22 AM CET 2026: status=OB DISCHRG charge=65% runtime=1283s +Sun Jan 11 08:40:32 AM CET 2026: status=OB DISCHRG charge=64% runtime=1280s +Sun Jan 11 08:40:42 AM CET 2026: status=OB DISCHRG charge=63% runtime=1244s +Sun Jan 11 08:40:52 AM CET 2026: status=OB DISCHRG charge=63% runtime=1228s +Sun Jan 11 08:41:02 AM CET 2026: status=OB DISCHRG charge=62% runtime=1209s +Sun Jan 11 08:41:12 AM CET 2026: status=OB DISCHRG charge=60% runtime=1230s +Sun Jan 11 08:41:22 AM CET 2026: status=OB DISCHRG charge=61% runtime=1204s +Sun Jan 11 08:41:32 AM CET 2026: status=OB DISCHRG charge=58% runtime=1189s +Sun Jan 11 08:41:42 AM CET 2026: status=OB DISCHRG charge=58% runtime=1189s +Sun Jan 11 08:41:52 AM CET 2026: status=OB DISCHRG charge=58% runtime=1218s +Sun Jan 11 08:42:02 AM CET 2026: status=OB DISCHRG charge=57% runtime=1211s +Sun Jan 11 08:42:12 AM CET 2026: status=OB DISCHRG charge=57% runtime=1197s +Sun Jan 11 08:42:22 AM CET 2026: status=OB DISCHRG charge=57% runtime=1282s +Sun Jan 11 08:42:32 AM CET 2026: status=OB DISCHRG charge=57% runtime=1282s +Sun Jan 11 08:42:42 AM CET 2026: status=OB DISCHRG charge=57% runtime=1282s +Sun Jan 11 08:42:52 AM CET 2026: status=OB DISCHRG charge=54% runtime=1215s +Sun Jan 11 08:43:02 AM CET 2026: status=OB DISCHRG charge=54% runtime=1215s +Sun Jan 11 08:43:12 AM CET 2026: status=OB DISCHRG charge=54% runtime=1215s +Sun Jan 11 08:43:22 AM CET 2026: status=OB DISCHRG charge=53% runtime=1152s +Sun Jan 11 08:43:32 AM CET 2026: status=OB DISCHRG charge=53% runtime=1152s +Sun Jan 11 08:43:42 AM CET 2026: status=OB DISCHRG charge=53% runtime=1152s +Sun Jan 11 08:43:53 AM CET 2026: status=OB DISCHRG charge=50% runtime=1012s +Sun Jan 11 08:44:03 AM CET 2026: status=OB DISCHRG charge=50% runtime=1012s +Sun Jan 11 08:44:13 AM CET 2026: status=OB DISCHRG charge=50% runtime=1012s +Sun Jan 11 08:44:23 AM CET 2026: status=OB DISCHRG charge=50% runtime=1012s +Sun Jan 11 08:44:33 AM CET 2026: status=OB DISCHRG charge=49% runtime=1016s +Sun Jan 11 08:44:43 AM CET 2026: status=OB DISCHRG charge=49% runtime=1016s +Sun Jan 11 08:44:53 AM CET 2026: status=OB DISCHRG charge=49% runtime=1016s +Sun Jan 11 08:45:03 AM CET 2026: status=OB DISCHRG charge=46% runtime=862s +Sun Jan 11 08:45:13 AM CET 2026: status=OB DISCHRG charge=46% runtime=862s +Sun Jan 11 08:45:23 AM CET 2026: status=OB DISCHRG charge=46% runtime=862s +Sun Jan 11 08:45:33 AM CET 2026: status=OB DISCHRG charge=44% runtime=913s +Sun Jan 11 08:45:43 AM CET 2026: status=OB DISCHRG charge=44% runtime=913s +Sun Jan 11 08:45:53 AM CET 2026: status=OB DISCHRG charge=44% runtime=913s +Sun Jan 11 08:46:03 AM CET 2026: status=OB DISCHRG charge=42% runtime=808s +Sun Jan 11 08:46:13 AM CET 2026: status=OB DISCHRG charge=41% runtime=799s +Sun Jan 11 08:46:23 AM CET 2026: status=OB DISCHRG charge=41% runtime=902s +Sun Jan 11 08:46:33 AM CET 2026: status=OB DISCHRG charge=40% runtime=890s +Sun Jan 11 08:46:43 AM CET 2026: status=OB DISCHRG charge=39% runtime=867s +Sun Jan 11 08:46:53 AM CET 2026: status=OB DISCHRG charge=39% runtime=838s +Sun Jan 11 08:47:03 AM CET 2026: status=OB DISCHRG charge=38% runtime=855s +Sun Jan 11 08:47:13 AM CET 2026: status=OB DISCHRG charge=37% runtime=730s +Sun Jan 11 08:47:23 AM CET 2026: status=OB DISCHRG charge=37% runtime=795s +Sun Jan 11 08:47:33 AM CET 2026: status=OB DISCHRG charge=37% runtime=786s +Sun Jan 11 08:47:43 AM CET 2026: status=OB DISCHRG charge=36% runtime=774s +Sun Jan 11 08:47:53 AM CET 2026: status=OB DISCHRG charge=36% runtime=792s +Sun Jan 11 08:48:03 AM CET 2026: status=OB DISCHRG charge=35% runtime=770s +Sun Jan 11 08:48:13 AM CET 2026: status=OB DISCHRG charge=34% runtime=756s +Sun Jan 11 08:48:23 AM CET 2026: status=OB DISCHRG charge=34% runtime=739s +Sun Jan 11 08:48:33 AM CET 2026: status=OB DISCHRG charge=34% runtime=756s +Sun Jan 11 08:48:43 AM CET 2026: status=OB DISCHRG charge=33% runtime=726s +Sun Jan 11 08:48:53 AM CET 2026: status=OB DISCHRG charge=32% runtime=696s +Sun Jan 11 08:49:03 AM CET 2026: status=OB DISCHRG charge=31% runtime=682s +Sun Jan 11 08:49:13 AM CET 2026: status=OB DISCHRG charge=31% runtime=682s +Sun Jan 11 08:49:23 AM CET 2026: status=OB DISCHRG charge=31% runtime=689s +Sun Jan 11 08:49:33 AM CET 2026: status=OB DISCHRG charge=30% runtime=630s +Sun Jan 11 08:49:43 AM CET 2026: status=OB DISCHRG charge=29% runtime=616s +Sun Jan 11 08:49:53 AM CET 2026: status=OB DISCHRG charge=28% runtime=609s +Sun Jan 11 08:50:03 AM CET 2026: status=OB DISCHRG charge=28% runtime=609s +Sun Jan 11 08:50:13 AM CET 2026: status=OB DISCHRG charge=27% runtime=594s +Sun Jan 11 08:50:23 AM CET 2026: status=OB DISCHRG charge=27% runtime=553s +Sun Jan 11 08:50:33 AM CET 2026: status=OB DISCHRG charge=25% runtime=500s +Sun Jan 11 08:50:43 AM CET 2026: status=OB DISCHRG charge=25% runtime=562s +Sun Jan 11 08:50:53 AM CET 2026: status=OB DISCHRG charge=24% runtime=534s +Sun Jan 11 08:51:03 AM CET 2026: status=OB DISCHRG charge=24% runtime=534s +Sun Jan 11 08:51:13 AM CET 2026: status=OB DISCHRG charge=24% runtime=528s +Sun Jan 11 08:51:23 AM CET 2026: status=OB DISCHRG charge=22% runtime=484s +Sun Jan 11 08:51:33 AM CET 2026: status=OB DISCHRG charge=23% runtime=500s +Sun Jan 11 08:51:43 AM CET 2026: status=OB DISCHRG charge=22% runtime=484s +Sun Jan 11 08:51:53 AM CET 2026: status=OB DISCHRG charge=22% runtime=489s +Sun Jan 11 08:52:03 AM CET 2026: status=OB DISCHRG charge=21% runtime=462s +Sun Jan 11 08:52:13 AM CET 2026: status=OB DISCHRG charge=19% runtime=384s +Sun Jan 11 08:52:23 AM CET 2026: status=OB DISCHRG charge=19% runtime=375s +Sun Jan 11 08:52:33 AM CET 2026: status=OB DISCHRG charge=19% runtime=422s +Sun Jan 11 08:52:43 AM CET 2026: status=OB DISCHRG charge=18% runtime=396s +Sun Jan 11 08:52:53 AM CET 2026: status=OB DISCHRG charge=18% runtime=391s +Sun Jan 11 08:53:03 AM CET 2026: status=OB DISCHRG charge=18% runtime=405s +Sun Jan 11 08:53:13 AM CET 2026: status=OB DISCHRG charge=16% runtime=360s +Sun Jan 11 08:53:23 AM CET 2026: status=OB DISCHRG charge=15% runtime=330s +Sun Jan 11 08:53:33 AM CET 2026: status=OB DISCHRG charge=15% runtime=281s +Sun Jan 11 08:53:43 AM CET 2026: status=OB DISCHRG charge=15% runtime=326s +Sun Jan 11 08:53:53 AM CET 2026: status=OB DISCHRG charge=13% runtime=253s +Sun Jan 11 08:54:03 AM CET 2026: status=FSD OB DISCHRG LB charge=13% runtime=286s +Connection to nodito closed by remote host. diff --git a/ups/ups-shutdown-journal-2.log b/ups/ups-shutdown-journal-2.log new file mode 100644 index 0000000..cf1dd0d --- /dev/null +++ b/ups/ups-shutdown-journal-2.log @@ -0,0 +1,18 @@ +Jan 11 08:59:48 nodito systemd[1]: Starting nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller... +Jan 11 08:59:48 nodito systemd[1]: Started nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller. +Jan 11 08:59:48 nodito nut-monitor[1624]: fopen /run/nut/upsmon.pid: No such file or directory +Jan 11 08:59:48 nodito nut-monitor[1624]: Could not find PID file to see if previous upsmon instance is already running! +Jan 11 08:59:48 nodito nut-monitor[1624]: UPS: cyberpower@localhost (primary) (power value 1) +Jan 11 08:59:48 nodito nut-monitor[1624]: Using power down flag file /etc/killpower +Jan 11 08:59:48 nodito nut-monitor[1646]: Init SSL without certificate database +Jan 11 08:59:48 nodito nut-monitor[1646]: upsnotify: notify about state 2 with libsystemd: was requested, but not running as a service unit now, will not spam more about it +Jan 11 08:59:48 nodito nut-monitor[1646]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it +Jan 11 08:59:48 nodito nut-monitor[1646]: upsnotify: logged the systemd watchdog situation once, will not spam more about it +Jan 11 09:03:18 nodito nut-monitor[1646]: UPS cyberpower@localhost on battery +Jan 11 09:03:18 nodito nut-monitor[3398]: Network UPS Tools upsmon 2.8.1 +Jan 11 09:03:43 nodito nut-monitor[1646]: UPS cyberpower@localhost battery is low +Jan 11 09:03:43 nodito nut-monitor[3525]: Network UPS Tools upsmon 2.8.1 +Jan 11 09:03:43 nodito nut-monitor[1646]: Executing automatic power-fail shutdown +Jan 11 09:03:43 nodito nut-monitor[1646]: Auto logout and shutdown proceeding +Jan 11 09:03:43 nodito nut-monitor[3531]: Network UPS Tools upsmon 2.8.1 +Connection to nodito closed by remote host. diff --git a/ups/ups-shutdown-journal.log b/ups/ups-shutdown-journal.log new file mode 100644 index 0000000..0df9cd9 --- /dev/null +++ b/ups/ups-shutdown-journal.log @@ -0,0 +1,18 @@ +Jan 10 23:22:36 nodito nut-monitor[254631]: UPS: cyberpower@localhost (primary) (power value 1) +Jan 10 23:22:36 nodito nut-monitor[254631]: Using power down flag file /etc/killpower +Jan 10 23:22:36 nodito nut-monitor[254637]: Init SSL without certificate database +Jan 10 23:22:36 nodito nut-monitor[254637]: upsnotify: notify about state 2 with libsystemd: was requested, but not running as a service unit now, will not spam more about it +Jan 10 23:22:36 nodito nut-monitor[254637]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it +Jan 10 23:22:36 nodito nut-monitor[254637]: upsnotify: logged the systemd watchdog situation once, will not spam more about it +Jan 11 00:10:36 nodito nut-monitor[254637]: UPS cyberpower@localhost on battery +Jan 11 00:10:36 nodito nut-monitor[270013]: Network UPS Tools upsmon 2.8.1 +Jan 11 00:35:21 nodito nut-monitor[254637]: UPS cyberpower@localhost on line power +Jan 11 00:35:21 nodito nut-monitor[278254]: Network UPS Tools upsmon 2.8.1 +Jan 11 08:24:47 nodito nut-monitor[254637]: UPS cyberpower@localhost on battery +Jan 11 08:24:47 nodito nut-monitor[425050]: Network UPS Tools upsmon 2.8.1 +Jan 11 08:54:02 nodito nut-monitor[254637]: UPS cyberpower@localhost battery is low +Jan 11 08:54:02 nodito nut-monitor[435551]: Network UPS Tools upsmon 2.8.1 +Jan 11 08:54:02 nodito nut-monitor[254637]: Executing automatic power-fail shutdown +Jan 11 08:54:02 nodito nut-monitor[254637]: Auto logout and shutdown proceeding +Jan 11 08:54:02 nodito nut-monitor[435556]: Network UPS Tools upsmon 2.8.1 +Connection to nodito closed by remote host. diff --git a/ups/ups.md b/ups/ups.md new file mode 100644 index 0000000..ff5ec7d --- /dev/null +++ b/ups/ups.md @@ -0,0 +1,392 @@ +# UPS + +On 2025-01-06, I received my UPS. It's a CyberPower CP900EPFCLCD SAI 900VA 540W. + +## Arrangement + +My plan is to use the UPS just for two devices: +- Nodito itself. +- The router, a ZTE F680. + +The UPS has ethernet surge protection pass-through, but I won't use it. My WAN connection is FTTH (Fiber To The Home)—the cable from the wall to the router is fiber optic (SC/APC connector), not copper. Fiber is inherently immune to electrical surges since it carries light, not electricity. The LAN cable between router and Nodito is internal and both devices share the same UPS, so there's no meaningful surge protection benefit from passing it through the UPS. + +My two main goals are: +- To allow Nodito and the WAN connection to survive brief (under <5min) power cuts. +- To allow Nodito to shutdown gracefully in case of sustained outages so that the hardware (specially the HDDs) doesn't go to shit. + +## Shutdown logic for nodito + +I'll use NUT (Network UPS Tools) to manage the UPS from Nodito. Management will consist of: +- Monitoring constantly to ensure the UPS is in good health and connected. Upon starting to rely on battery, this monitor should notify. This will be a push type monitor towards my Uptime Kuma instance. +- The UPS itself tracks low battery thresholds (`battery.runtime.low` = 300s, `battery.charge.low` = 10%). When either threshold is crossed, the UPS sets the LB (Low Battery) flag. NUT's upsmon detects LB and automatically triggers shutdown—no custom scripts needed. +- After shutdown, the UPS cuts outlet power (after `offdelay` seconds), enabling automatic restart when mains returns (BIOS "restore on AC loss"). + +## Checklist and drills + +- Physical setup + - Shutdown Nodito + - Shutdown router + - Plug UPS to power + - Plug Nodito and router to UPS power outlets + - Plug USB cord between UPS and Nodito + - Start Nodito and router. Verify they run properly. +- NUT setup (run these on Nodito, the machine connected to the UPS via USB) + - Install NUT + ```bash + sudo apt update && sudo apt install nut + ``` + Expected: Package installs successfully. NUT services won't start yet (no config). + - Verify that USB detects the UPS + ```bash + lsusb | grep -i cyber + ``` + Expected output: + ``` + Bus 00X Device 00Y: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS + ``` + (Vendor ID 0764 is CyberPower. Product ID may vary by model.) + - Reload udev rules and verify USB permissions (NUT installs rules but they need triggering for already-plugged devices) + ```bash + sudo udevadm control --reload-rules + sudo udevadm trigger --subsystem-match=usb --action=add + ``` + Verify the UPS device has `nut` group (replace bus/device numbers from lsusb output above): + ```bash + ls -la /dev/bus/usb/00X/00Y + ``` + Expected: `crw-rw-r-- 1 root nut ...` — if it shows `root root` instead of `root nut`, the driver won't be able to access the UPS. + - Scan for UPS devices with nut-scanner + ```bash + sudo nut-scanner -U + ``` + Expected output: + ``` + [nutdev1] + driver = "usbhid-ups" + port = "auto" + vendorid = "0764" + productid = "0501" + product = "CP900EPFCLCD" + vendor = "CPS" + bus = "001" + ``` + - Configure NUT mode in `/etc/nut/nut.conf` (standalone = UPS is directly connected to this machine; other modes are for network setups where multiple machines share one UPS) + ```bash + cat /etc/nut/nut.conf + ``` + Should contain: + ``` + MODE=standalone + ``` + - Configure UPS device in `/etc/nut/ups.conf` (declares the UPS to NUT—without this, NUT won't know the UPS exists even if USB sees it) + ```bash + cat /etc/nut/ups.conf + ``` + Should contain something like: + ``` + [cyberpower] + driver = usbhid-ups + port = auto + desc = "CyberPower CP900EPFCLCD" + offdelay = 120 + ondelay = 30 + ``` + - `offdelay = 120` — seconds after `upsdrvctl shutdown` before UPS cuts outlet power (2 min to ensure system is fully halted) + - `ondelay = 30` — seconds after mains returns before UPS restores outlet power + - Configure upsd users in `/etc/nut/upsd.users` (the upsmon daemon authenticates to upsd to get UPS data; "master" means this machine is directly connected and can command the shutdown sequence) + ```bash + sudo cat /etc/nut/upsd.users + ``` + Should contain: + ``` + [counterweight] + password = yourpassword + upsmon master + ``` + - Configure upsmon in `/etc/nut/upsmon.conf` (tells upsmon which UPS to monitor and how to handle events) + + Edit the default file—only add or modify the lines shown below, keep the rest (MINSUPPLIES, POLLFREQ, DEADTIME, etc. have sensible defaults). + ```bash + sudo grep -E "^MONITOR|^SHUTDOWNCMD|^POWERDOWNFLAG" /etc/nut/upsmon.conf + ``` + Should contain: + ``` + MONITOR cyberpower@localhost 1 counterweight yourpassword master + SHUTDOWNCMD "/sbin/shutdown -h +0" + POWERDOWNFLAG /etc/killpower + ``` + That's it. When the UPS sets the LB (Low Battery) flag, upsmon automatically triggers shutdown—no custom scripts needed. + - Verify low battery thresholds + ```bash + upsc cyberpower@localhost battery.runtime.low + upsc cyberpower@localhost battery.charge.low + ``` + Expected: + ``` + 300 + 10 + ``` + The UPS sets LB flag when runtime < 300s (5 min) OR charge < 10%. If you want different thresholds, check if they're writable: + ```bash + upsrw cyberpower@localhost 2>&1 | grep -E "battery.runtime.low|battery.charge.low" + ``` + If writable, adjust with: `upsrw -s battery.runtime.low=300 cyberpower@localhost` + - Verify late-stage shutdown script exists (no changes needed) + ```bash + ls -l /lib/systemd/system-shutdown/nutshutdown + ``` + Expected: File exists and is executable. This script is provided by the NUT package and already does what we need—it checks for the killpower flag (via `upsmon -K`) and runs `upsdrvctl shutdown` to tell the UPS to cut outlet power. Without this, the server would shut down but the UPS would keep feeding power, so BIOS "restore on AC loss" would never trigger. + - Start and enable NUT services + ```bash + sudo systemctl restart nut-driver-enumerator nut-server nut-monitor + sudo systemctl enable nut-driver-enumerator nut-server nut-monitor + ``` + Expected: All services start without errors. + Note: `nut-driver-enumerator` reads `ups.conf` and starts the appropriate driver(s) via `nut-driver@.service`. + - Verify services are running + ```bash + systemctl status nut-driver-enumerator nut-server nut-monitor --no-pager + ``` + Expected: All three show `active` (enumerator may show `inactive` after completing its job—that's OK, check the driver instance instead): + ```bash + systemctl status nut-driver@cyberpower.service --no-pager + ``` + - Verify upsc receives data from UPS + ```bash + upsc cyberpower@localhost + ``` + Expected output (partial): + ``` + battery.charge: 100 + battery.runtime: 1800 + device.model: CP900EPFCLCD + input.voltage: 230.0 + output.voltage: 230.0 + ups.load: 15 + ups.status: OL + ``` + - Setup monitoring (Uptime Kuma push monitor) + - Create a push monitor in Uptime Kuma, note the push URL + - Create a script `/usr/local/bin/ups-heartbeat.sh`: + ```bash + #!/bin/bash + STATUS=$(upsc cyberpower@localhost ups.status 2>/dev/null) + if [[ "$STATUS" == *"OL"* ]]; then + curl -s "https://uptime.example.com/api/push/xxxxx?status=up&msg=UPS%20on%20mains" > /dev/null + fi + # No push when on battery → Uptime Kuma times out → shows DOWN + ``` + - Add cron job: + ```bash + sudo crontab -e + # Add: * * * * * /usr/local/bin/ups-heartbeat.sh + ``` + - Verify heartbeat is working: + ```bash + /usr/local/bin/ups-heartbeat.sh && echo "Heartbeat sent" + ``` +- Drills + - Rely on battery drill + - Start with everything running and plugged + - Verify initial status is OL (online) + ```bash + upsc cyberpower@localhost ups.status + ``` + Expected: `OL` + - Start continuous monitoring in a terminal (keep this running throughout the drill) + ```bash + while true; do + echo "$(date): status=$(upsc cyberpower@localhost ups.status 2>/dev/null) charge=$(upsc cyberpower@localhost battery.charge 2>/dev/null)% runtime=$(upsc cyberpower@localhost battery.runtime 2>/dev/null)s" + sleep 10 + done + ``` + - In another terminal, start continuous ping (verifies network stays up throughout) + ```bash + ping 8.8.8.8 + ``` + - **Unplug UPS from power line** + - Watch the monitoring output—status should change from OL to OB DISCHRG + ``` + Sat Jan 11 10:00:00 CET 2025: status=OL charge=100% runtime=1800s + Sat Jan 11 10:00:10 CET 2025: status=OB DISCHRG charge=99% runtime=1750s + Sat Jan 11 10:00:20 CET 2025: status=OB DISCHRG charge=98% runtime=1700s + ... + ``` + - Uptime Kuma monitor should go DOWN (wait up to 1 minute for next heartbeat) + - Keep watching the drain. When remaining runtime reaches ~6 minutes (360s), **plug UPS back to main power** (before the 300s threshold triggers LB) + - Watch monitoring output—status should change to OL CHRG + ``` + Sat Jan 11 10:15:00 CET 2025: status=OB DISCHRG charge=45% runtime=380s + Sat Jan 11 10:15:10 CET 2025: status=OL CHRG charge=45% runtime=390s + ... + ``` + - Uptime Kuma monitor should go back to UP (wait up to 1 minute for next heartbeat) + - Verify ping ran continuously without packet loss throughout the drill + - Stop both monitoring loops (Ctrl+C in each terminal) + - Power out completely drill + - Start with everything running and plugged + - From your laptop, verify initial state via SSH + ```bash + ssh nodito 'upsc cyberpower@localhost ups.status' + ``` + Expected: `OL` + - From your laptop, start continuous monitoring via SSH (logs to local file) + ```bash + ssh nodito 'while true; do echo "$(date): status=$(upsc cyberpower@localhost ups.status 2>/dev/null) charge=$(upsc cyberpower@localhost battery.charge 2>/dev/null)% runtime=$(upsc cyberpower@localhost battery.runtime 2>/dev/null)s"; sleep 10; done' 2>&1 | tee ~/ups-shutdown-drill.log + ``` + - In another laptop terminal, watch system logs via SSH + ```bash + ssh nodito 'journalctl -u nut-monitor -f' 2>&1 | tee ~/ups-shutdown-journal.log + ``` + - **Unplug UPS from power line** + - Watch the monitoring output as battery drains + - When runtime drops below 300s (or charge below 10%), the LB flag should appear + ``` + status=OB LB DISCHRG + ``` + - Watch for shutdown sequence in journal output + ``` + upsmon[1234]: UPS cyberpower@localhost on battery + upsmon[1234]: UPS cyberpower@localhost battery is low + upsmon[1234]: Executing automatic power-fail shutdown + ``` + - SSH sessions will die when server shuts down—that's expected. Your logs are saved locally in `~/ups-shutdown-drill.log` and `~/ups-shutdown-journal.log` + - After server shuts down: plug in a lamp in the same UPS outlet the server was connected to. Verify the outlet goes dead (lamp turns off) even though UPS still has battery—this confirms `upsdrvctl shutdown` command was sent. + - Plug back server, plug back UPS to power line + - Verify that server boots automatically (BIOS "restore on AC loss" triggers) + - After boot, verify NUT is running and UPS is detected + ```bash + systemctl status nut-driver-enumerator nut-server nut-monitor --no-pager + upsc cyberpower@localhost ups.status + ``` + Expected: Services running, status shows `OL CHRG`. + - Lose data connection drill + - Start with everything running and plugged + - Verify initial connection + ```bash + upsc cyberpower@localhost ups.status + ``` + Expected: `OL` + - Disconnect the USB cable between server and UPS + - Validate that NUT detects the communication loss + ```bash + upsc cyberpower@localhost + ``` + Expected output: + ``` + Error: Data stale + ``` + Or after a few seconds: + ``` + Error: Driver not connected + ``` + - Check driver status + ```bash + systemctl status nut-driver@cyberpower.service --no-pager + ``` + Expected: Service may show errors or have restarted. + - Check system logs for communication loss + ```bash + journalctl -u nut-monitor --since "5 minutes ago" | grep -i comm + ``` + Expected output: + ``` + upsmon[1234]: Communications with UPS cyberpower@localhost lost + ``` + - Validate that Uptime Kuma notifies the issue (the heartbeat script will fail to get status, or you can configure NUT's NOTIFYCMD for COMMBAD events) + - Reconnect USB cable + - Verify communication restored + ```bash + upsc cyberpower@localhost ups.status + ``` + Expected: `OL` (may take a few seconds for driver to reconnect) + - Check logs for restoration + ```bash + journalctl -u nut-monitor --since "5 minutes ago" | grep -i comm + ``` + Expected: + ``` + upsmon[1234]: Communications with UPS cyberpower@localhost established + ``` + +## Notes from drill execution + +### Running on battery + +- Runtime is really unstable, can flip 5min up or down on the spot. Battery charge falls linearly. +- The UPS stays on battery at 100% for quite some time, then starts falling fast. It's misreporting, as lead acid batteries do. +- Notifications worked fine. +- From the test, I conclude that total runtime until shutdown, with medium server load, will probably be of around 20min. +- Find below actual log lines from monitoring the UPS status once per minute during the drill. +``` +Sun Jan 11 12:09:44 AM CET 2026: status=OL charge=100% runtime=2610s +Sun Jan 11 12:10:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2558s +Sun Jan 11 12:11:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2647s +Sun Jan 11 12:12:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2360s +Sun Jan 11 12:13:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2479s +Sun Jan 11 12:14:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2133s +Sun Jan 11 12:15:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2214s +Sun Jan 11 12:16:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2193s +Sun Jan 11 12:17:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2146s +Sun Jan 11 12:18:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2054s +Sun Jan 11 12:19:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2091s +Sun Jan 11 12:20:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=1868s +Sun Jan 11 12:21:44 AM CET 2026: status=OB DISCHRG charge=98% runtime=2107s +Sun Jan 11 12:22:44 AM CET 2026: status=OB DISCHRG charge=96% runtime=2160s +Sun Jan 11 12:23:44 AM CET 2026: status=OB DISCHRG charge=93% runtime=2092s +Sun Jan 11 12:24:44 AM CET 2026: status=OB DISCHRG charge=91% runtime=1592s +Sun Jan 11 12:25:44 AM CET 2026: status=OB DISCHRG charge=87% runtime=1522s +Sun Jan 11 12:26:44 AM CET 2026: status=OB DISCHRG charge=83% runtime=1660s +Sun Jan 11 12:27:44 AM CET 2026: status=OB DISCHRG charge=79% runtime=1540s +Sun Jan 11 12:28:44 AM CET 2026: status=OB DISCHRG charge=75% runtime=1368s +Sun Jan 11 12:29:44 AM CET 2026: status=OB DISCHRG charge=71% runtime=1384s +Sun Jan 11 12:30:44 AM CET 2026: status=OB DISCHRG charge=66% runtime=1254s +Sun Jan 11 12:31:44 AM CET 2026: status=OB DISCHRG charge=61% runtime=1204s +Sun Jan 11 12:32:44 AM CET 2026: status=OB DISCHRG charge=58% runtime=1102s +Sun Jan 11 12:33:44 AM CET 2026: status=OB DISCHRG charge=54% runtime=1053s +Sun Jan 11 12:34:44 AM CET 2026: status=OB DISCHRG charge=49% runtime=943s +Sun Jan 11 12:35:44 AM CET 2026: status=OL CHRG charge=47% runtime=916s +``` + +### Controlled shutdown and boot again + +- We run as planned, plugging a lamp to the UPS to also see visually how the UPS shutsdown. +- Lesson learned: the UPS doesn't just shutdown the schuko where the server was plugged. It shutsdown the entire UPS device. Once you plug to main power, the UPS starts again (and eventually, the server once it picks up power). +- Total runtime until shutdown was of 29 minutes. +- Wake on power worked fine. +- If the UPS sound alarm is active, the UPS shutdown is extremely noisy. Once it has one minute left to shut itself down, it beeps on every second. +- Runtime readings keep being quite unstable, but as the battery drains the variance decreases. The UPS went for server shutdown finally at 13% charge and 286s of runtime left. +- Charge readings suddenly change drastically when you plug/unplug the UPS from main. After the UPS shutdown (at 13% charge), I plugged main back and suddenly it was reading 40% within one minute. I unplugged from main again a couple of minutes later and it read 21% charge, and 20 seconds after it read 14%. + + +## Side quests + +- What is the story of NUT. Who maintains it. Where's the code hosted. + - NUT (Network UPS Tools) started in the late 1990s. It's open source, community-maintained, and hosted at https://github.com/networkupstools/nut. It's the de-facto standard for UPS management on Linux/Unix. +- About using port = auto: how does linux find out which device is the UPS? + - Linux identifies USB devices by vendor ID and product ID via the USB subsystem. When you plug in the UPS, it registers as a USB HID device. NUT's `usbhid-ups` driver scans connected USB devices looking for known UPS vendor/product ID combinations. "auto" tells it to scan and find the match automatically. +- About the "low battery" status: how does the Cyberpower UPS decide? What's the criteria to be in that status? What are other statuses? + - The UPS itself determines "low battery" based on internal logic—typically when remaining runtime drops below ~2 minutes or battery charge falls below ~20% (varies by model, sometimes configurable on the UPS). Other statuses include: OL (online/on mains), OB (on battery), LB (low battery), RB (replace battery), CHRG (charging), DISCHRG (discharging), BYPASS, CAL (calibrating), OFF, OVER (overloaded), TRIM, BOOST. +- Does NUT allow to query the state of the UPS more granularly? How can that be done? What info is shared? + - Yes. Use `upsc ` to see all variables the UPS reports: battery charge %, estimated runtime, input/output voltage, load %, temperature, etc. Use `upscmd -l ` to list available commands (like beeper toggle, battery test). What's available depends on what your specific UPS exposes over USB. +- How can I monitor that the UPS is properly plugged in? + - Run `upsc `—if it returns data, connection is good. Check service status with `systemctl status nut-driver nut-server`. NUT can also send notifications via NOTIFYCMD when communication is lost (COMMBAD) or restored (COMMOK). For dashboards, you can use nut_exporter for Prometheus/Grafana integration. +- What is the difference between battery charge % and load % metrics provided by `upsc`? + - Battery charge % (`battery.charge`): How full the battery is—100% means fully charged, 0% means empty. Load % (`ups.load`): How much of the UPS's output capacity is currently being used. If your 540W UPS is powering 270W of devices, load is ~50%. They're independent: you can have 100% charge with 80% load, or 20% charge with 10% load. +- What's better, that I signal Nodito to shutdown on a certain battery level, or on a certain remaining uptime? I would rather ensure graceful shutdown than extend uptime another minute. + - Remaining runtime is better for your goal because it accounts for actual load—50% battery at high load might mean 2 minutes, while 50% at low load might mean 10 minutes. However, runtime estimates can be inaccurate on consumer UPS units. Safest approach: just trust the UPS's built-in LB (low battery) flag, which is exactly what NUT's default `upsmon` does. It's designed to leave enough time for graceful shutdown. If you want extra margin, you can trigger on `battery.runtime` < 180 seconds (3 min) instead. +- How can I set things up so that, after a low battery and server shutdown, once the UPS starts getting power again, the server also starts again automatically? The server BIOS is set to boot on power coming back. + - Your BIOS "restore on AC loss" setting handles the server side. For the UPS side: NUT's default behavior just shuts down the OS, not the UPS itself—the UPS keeps running and will stay on when mains returns. Your CyberPower will automatically restore output when power comes back. The only gotcha: if you configure NUT to send a shutdown command to the UPS itself (via POWERDOWNFLAG/upsdrvctl), make sure "auto-restart on AC restore" is enabled on the UPS (usually the default). With your BIOS set correctly, the chain is: power returns → UPS restores output → server sees power → BIOS boots. +- Who controls who? Does the UPS tell the server to shutdown, or does the server decide? + - The server monitors the UPS and decides—the UPS is passive. The UPS just continuously reports its state over USB (battery %, on-battery vs on-mains, low battery flag, etc.). NUT polls this data, and `upsmon` watches for conditions (like the LB flag) and decides to run the shutdown command. The UPS doesn't "tell" the server anything—it just answers status queries. The only command that goes TO the UPS is optional: after shutdown, NUT can tell the UPS "cut your outlet power." +- But wait: if NUT runs on the server, how can it send a command to the UPS after the server shuts down? + - It can't send anything after shutdown—the trick is timing. The command is sent during shutdown, but the UPS delays acting on it. The sequence: (1) `upsmon` detects low battery and initiates system shutdown. (2) Late in the shutdown process (but before fully off), a shutdown script calls `upsdrvctl shutdown`. (3) This tells the UPS "cut power in X seconds"—the UPS has an internal delay timer. (4) Server finishes shutting down. (5) UPS waits out its timer, then cuts outlet power. This is what `POWERDOWNFLAG` in `upsmon.conf` is for—it creates a flag file that late-stage shutdown scripts check, and if present, they call `upsdrvctl shutdown` before the system halts. +- Why do we need the UPS to cut outlet power at all? + - To enable auto-restart when mains returns. Consider: power goes out → UPS uses battery → battery gets low → server shuts down → but power comes back before UPS is completely drained. The server's power supply never actually lost power (the UPS kept feeding it throughout), so "restore on AC loss" in BIOS never triggers—the server stays off. By having the UPS cut outlet power after server shutdown, the server's PSU sees a power loss event. Then when mains returns and UPS restores outlets, the server sees "power restored" and BIOS boots it. +- Does the UPS automatically restore outlet power when mains returns after a commanded power cut? + - Yes. Most UPS units (including CyberPower) have "auto-restart" enabled by default. When mains returns: (1) UPS detects mains power. (2) UPS restores outlet power (either immediately, or after battery reaches a minimum charge—depends on UPS settings). (3) Server sees power → BIOS "restore on AC loss" kicks in → server boots. Some UPS units let you configure this behavior (e.g., "wait until battery is 20% charged before restoring outlets"), but out of the box it should just work. +- Do I need to write a custom script to handle shutdown when the UPS battery is low? + - No. The `upsmon` daemon handles this automatically. It runs constantly in the background, polling the UPS status every few seconds (configured via POLLFREQ). By default, it watches for the LB (Low Battery) flag—when the UPS decides its battery is critically low, it sets this flag, and upsmon sees it and runs SHUTDOWNCMD. The whole point of NUT is that this logic is built-in. Your custom heartbeat script is only for Uptime Kuma notifications; it has nothing to do with shutdown orchestration. +- The UPS has ethernet surge protection pass-through. Why am I not using it? + - My WAN connection is FTTH (Fiber To The Home). The cable from the wall to my router is fiber optic with an SC/APC connector (round, smaller than RJ45, with a ceramic ferrule inside)—not copper ethernet. Fiber carries light, not electricity, so it's inherently immune to electrical surges from lightning or power line disturbances. The ethernet surge protection on a UPS is designed for copper cables that run outside the building or between different electrical zones. My only ethernet cable is the LAN connection between the router and Nodito, which is entirely internal, and both devices are plugged into the same UPS anyway. If a surge hit my home's electrical system, both devices would experience it through their power supplies—the ethernet path between them isn't a meaningful risk vector. So the pass-through provides no practical benefit in my setup. +