homelab/ups/ups.md
2026-01-11 22:17:05 +01:00

24 KiB

UPS

On 2025-01-06, I received my UPS. It's a CyberPower CP900EPFCLCD SAI 900VA 540W.

Arrangement

My plan is to use the UPS just for two devices:

  • Nodito itself.
  • The router, a ZTE F680.

The UPS has ethernet surge protection pass-through, but I won't use it. My WAN connection is FTTH (Fiber To The Home)—the cable from the wall to the router is fiber optic (SC/APC connector), not copper. Fiber is inherently immune to electrical surges since it carries light, not electricity. The LAN cable between router and Nodito is internal and both devices share the same UPS, so there's no meaningful surge protection benefit from passing it through the UPS.

My two main goals are:

  • To allow Nodito and the WAN connection to survive brief (under <5min) power cuts.
  • To allow Nodito to shutdown gracefully in case of sustained outages so that the hardware (specially the HDDs) doesn't go to shit.

Shutdown logic for nodito

I'll use NUT (Network UPS Tools) to manage the UPS from Nodito. Management will consist of:

  • Monitoring constantly to ensure the UPS is in good health and connected. Upon starting to rely on battery, this monitor should notify. This will be a push type monitor towards my Uptime Kuma instance.
  • The UPS itself tracks low battery thresholds (battery.runtime.low = 300s, battery.charge.low = 10%). When either threshold is crossed, the UPS sets the LB (Low Battery) flag. NUT's upsmon detects LB and automatically triggers shutdown—no custom scripts needed.
  • After shutdown, the UPS cuts outlet power (after offdelay seconds), enabling automatic restart when mains returns (BIOS "restore on AC loss").

Checklist and drills

  • Physical setup
    • Shutdown Nodito
    • Shutdown router
    • Plug UPS to power
    • Plug Nodito and router to UPS power outlets
    • Plug USB cord between UPS and Nodito
    • Start Nodito and router. Verify they run properly.
  • NUT setup (run these on Nodito, the machine connected to the UPS via USB)
    • Install NUT

      sudo apt update && sudo apt install nut
      

      Expected: Package installs successfully. NUT services won't start yet (no config).

    • Verify that USB detects the UPS

      lsusb | grep -i cyber
      

      Expected output:

      Bus 00X Device 00Y: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
      

      (Vendor ID 0764 is CyberPower. Product ID may vary by model.)

    • Reload udev rules and verify USB permissions (NUT installs rules but they need triggering for already-plugged devices)

      sudo udevadm control --reload-rules
      sudo udevadm trigger --subsystem-match=usb --action=add
      

      Verify the UPS device has nut group (replace bus/device numbers from lsusb output above):

      ls -la /dev/bus/usb/00X/00Y
      

      Expected: crw-rw-r-- 1 root nut ... — if it shows root root instead of root nut, the driver won't be able to access the UPS.

    • Scan for UPS devices with nut-scanner

      sudo nut-scanner -U
      

      Expected output:

      [nutdev1]
          driver = "usbhid-ups"
          port = "auto"
          vendorid = "0764"
          productid = "0501"
          product = "CP900EPFCLCD"
          vendor = "CPS"
          bus = "001"
      
    • Configure NUT mode in /etc/nut/nut.conf (standalone = UPS is directly connected to this machine; other modes are for network setups where multiple machines share one UPS)

      cat /etc/nut/nut.conf
      

      Should contain:

      MODE=standalone
      
    • Configure UPS device in /etc/nut/ups.conf (declares the UPS to NUT—without this, NUT won't know the UPS exists even if USB sees it)

      cat /etc/nut/ups.conf
      

      Should contain something like:

      [cyberpower]
          driver = usbhid-ups
          port = auto
          desc = "CyberPower CP900EPFCLCD"
          offdelay = 120
          ondelay = 30
      
      • offdelay = 120 — seconds after upsdrvctl shutdown before UPS cuts outlet power (2 min to ensure system is fully halted)
      • ondelay = 30 — seconds after mains returns before UPS restores outlet power
    • Configure upsd users in /etc/nut/upsd.users (the upsmon daemon authenticates to upsd to get UPS data; "master" means this machine is directly connected and can command the shutdown sequence)

      sudo cat /etc/nut/upsd.users
      

      Should contain:

      [counterweight]
          password = yourpassword
          upsmon master
      
    • Configure upsmon in /etc/nut/upsmon.conf (tells upsmon which UPS to monitor and how to handle events)

      Edit the default file—only add or modify the lines shown below, keep the rest (MINSUPPLIES, POLLFREQ, DEADTIME, etc. have sensible defaults).

      sudo grep -E "^MONITOR|^SHUTDOWNCMD|^POWERDOWNFLAG" /etc/nut/upsmon.conf
      

      Should contain:

      MONITOR cyberpower@localhost 1 counterweight yourpassword master
      SHUTDOWNCMD "/sbin/shutdown -h +0"
      POWERDOWNFLAG /etc/killpower
      

      That's it. When the UPS sets the LB (Low Battery) flag, upsmon automatically triggers shutdown—no custom scripts needed.

    • Verify low battery thresholds

      upsc cyberpower@localhost battery.runtime.low
      upsc cyberpower@localhost battery.charge.low
      

      Expected:

      300
      10
      

      The UPS sets LB flag when runtime < 300s (5 min) OR charge < 10%. If you want different thresholds, check if they're writable:

      upsrw cyberpower@localhost 2>&1 | grep -E "battery.runtime.low|battery.charge.low"
      

      If writable, adjust with: upsrw -s battery.runtime.low=300 cyberpower@localhost

    • Verify late-stage shutdown script exists (no changes needed)

      ls -l /lib/systemd/system-shutdown/nutshutdown
      

      Expected: File exists and is executable. This script is provided by the NUT package and already does what we need—it checks for the killpower flag (via upsmon -K) and runs upsdrvctl shutdown to tell the UPS to cut outlet power. Without this, the server would shut down but the UPS would keep feeding power, so BIOS "restore on AC loss" would never trigger.

    • Start and enable NUT services

      sudo systemctl restart nut-driver-enumerator nut-server nut-monitor
      sudo systemctl enable nut-driver-enumerator nut-server nut-monitor
      

      Expected: All services start without errors. Note: nut-driver-enumerator reads ups.conf and starts the appropriate driver(s) via nut-driver@<upsname>.service.

    • Verify services are running

      systemctl status nut-driver-enumerator nut-server nut-monitor --no-pager
      

      Expected: All three show active (enumerator may show inactive after completing its job—that's OK, check the driver instance instead):

      systemctl status nut-driver@cyberpower.service --no-pager
      
    • Verify upsc receives data from UPS

      upsc cyberpower@localhost
      

      Expected output (partial):

      battery.charge: 100
      battery.runtime: 1800
      device.model: CP900EPFCLCD
      input.voltage: 230.0
      output.voltage: 230.0
      ups.load: 15
      ups.status: OL
      
    • Setup monitoring (Uptime Kuma push monitor)

      • Create a push monitor in Uptime Kuma, note the push URL
      • Create a script /usr/local/bin/ups-heartbeat.sh:
        #!/bin/bash
        STATUS=$(upsc cyberpower@localhost ups.status 2>/dev/null)
        if [[ "$STATUS" == *"OL"* ]]; then
            curl -s "https://uptime.example.com/api/push/xxxxx?status=up&msg=UPS%20on%20mains" > /dev/null
        fi
        # No push when on battery → Uptime Kuma times out → shows DOWN
        
      • Add cron job:
        sudo crontab -e
        # Add: * * * * * /usr/local/bin/ups-heartbeat.sh
        
      • Verify heartbeat is working:
        /usr/local/bin/ups-heartbeat.sh && echo "Heartbeat sent"
        
  • Drills
    • Rely on battery drill
      • Start with everything running and plugged
      • Verify initial status is OL (online)
        upsc cyberpower@localhost ups.status
        
        Expected: OL
      • Start continuous monitoring in a terminal (keep this running throughout the drill)
        while true; do
          echo "$(date): status=$(upsc cyberpower@localhost ups.status 2>/dev/null) charge=$(upsc cyberpower@localhost battery.charge 2>/dev/null)% runtime=$(upsc cyberpower@localhost battery.runtime 2>/dev/null)s"
          sleep 10
        done
        
      • In another terminal, start continuous ping (verifies network stays up throughout)
        ping 8.8.8.8
        
      • Unplug UPS from power line
      • Watch the monitoring output—status should change from OL to OB DISCHRG
        Sat Jan 11 10:00:00 CET 2025: status=OL charge=100% runtime=1800s
        Sat Jan 11 10:00:10 CET 2025: status=OB DISCHRG charge=99% runtime=1750s
        Sat Jan 11 10:00:20 CET 2025: status=OB DISCHRG charge=98% runtime=1700s
        ...
        
      • Uptime Kuma monitor should go DOWN (wait up to 1 minute for next heartbeat)
      • Keep watching the drain. When remaining runtime reaches ~6 minutes (360s), plug UPS back to main power (before the 300s threshold triggers LB)
      • Watch monitoring output—status should change to OL CHRG
        Sat Jan 11 10:15:00 CET 2025: status=OB DISCHRG charge=45% runtime=380s
        Sat Jan 11 10:15:10 CET 2025: status=OL CHRG charge=45% runtime=390s
        ...
        
      • Uptime Kuma monitor should go back to UP (wait up to 1 minute for next heartbeat)
      • Verify ping ran continuously without packet loss throughout the drill
      • Stop both monitoring loops (Ctrl+C in each terminal)
    • Power out completely drill
      • Start with everything running and plugged
      • From your laptop, verify initial state via SSH
        ssh nodito 'upsc cyberpower@localhost ups.status'
        
        Expected: OL
      • From your laptop, start continuous monitoring via SSH (logs to local file)
        ssh nodito 'while true; do echo "$(date): status=$(upsc cyberpower@localhost ups.status 2>/dev/null) charge=$(upsc cyberpower@localhost battery.charge 2>/dev/null)% runtime=$(upsc cyberpower@localhost battery.runtime 2>/dev/null)s"; sleep 10; done' 2>&1 | tee ~/ups-shutdown-drill.log
        
      • In another laptop terminal, watch system logs via SSH
        ssh nodito 'journalctl -u nut-monitor -f' 2>&1 | tee ~/ups-shutdown-journal.log
        
      • Unplug UPS from power line
      • Watch the monitoring output as battery drains
      • When runtime drops below 300s (or charge below 10%), the LB flag should appear
        status=OB LB DISCHRG
        
      • Watch for shutdown sequence in journal output
        upsmon[1234]: UPS cyberpower@localhost on battery
        upsmon[1234]: UPS cyberpower@localhost battery is low
        upsmon[1234]: Executing automatic power-fail shutdown
        
      • SSH sessions will die when server shuts down—that's expected. Your logs are saved locally in ~/ups-shutdown-drill.log and ~/ups-shutdown-journal.log
      • After server shuts down: plug in a lamp in the same UPS outlet the server was connected to. Verify the outlet goes dead (lamp turns off) even though UPS still has battery—this confirms upsdrvctl shutdown command was sent.
      • Plug back server, plug back UPS to power line
      • Verify that server boots automatically (BIOS "restore on AC loss" triggers)
      • After boot, verify NUT is running and UPS is detected
        systemctl status nut-driver-enumerator nut-server nut-monitor --no-pager
        upsc cyberpower@localhost ups.status
        
        Expected: Services running, status shows OL CHRG.
    • Lose data connection drill
      • Start with everything running and plugged
      • Verify initial connection
        upsc cyberpower@localhost ups.status
        
        Expected: OL
      • Disconnect the USB cable between server and UPS
      • Validate that NUT detects the communication loss
        upsc cyberpower@localhost
        
        Expected output:
        Error: Data stale
        
        Or after a few seconds:
        Error: Driver not connected
        
      • Check driver status
        systemctl status nut-driver@cyberpower.service --no-pager
        
        Expected: Service may show errors or have restarted.
      • Check system logs for communication loss
        journalctl -u nut-monitor --since "5 minutes ago" | grep -i comm
        
        Expected output:
        upsmon[1234]: Communications with UPS cyberpower@localhost lost
        
      • Validate that Uptime Kuma notifies the issue (the heartbeat script will fail to get status, or you can configure NUT's NOTIFYCMD for COMMBAD events)
      • Reconnect USB cable
      • Verify communication restored
        upsc cyberpower@localhost ups.status
        
        Expected: OL (may take a few seconds for driver to reconnect)
      • Check logs for restoration
        journalctl -u nut-monitor --since "5 minutes ago" | grep -i comm
        
        Expected:
        upsmon[1234]: Communications with UPS cyberpower@localhost established
        

Notes from drill execution

Running on battery

  • Runtime is really unstable, can flip 5min up or down on the spot. Battery charge falls linearly.
  • The UPS stays on battery at 100% for quite some time, then starts falling fast. It's misreporting, as lead acid batteries do.
  • Notifications worked fine.
  • From the test, I conclude that total runtime until shutdown, with medium server load, will probably be of around 20min.
  • Find below actual log lines from monitoring the UPS status once per minute during the drill.
Sun Jan 11 12:09:44 AM CET 2026: status=OL charge=100% runtime=2610s
Sun Jan 11 12:10:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2558s
Sun Jan 11 12:11:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2647s
Sun Jan 11 12:12:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2360s
Sun Jan 11 12:13:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2479s
Sun Jan 11 12:14:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2133s
Sun Jan 11 12:15:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2214s
Sun Jan 11 12:16:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2193s
Sun Jan 11 12:17:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2146s
Sun Jan 11 12:18:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2054s
Sun Jan 11 12:19:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=2091s
Sun Jan 11 12:20:44 AM CET 2026: status=OB DISCHRG charge=100% runtime=1868s
Sun Jan 11 12:21:44 AM CET 2026: status=OB DISCHRG charge=98% runtime=2107s
Sun Jan 11 12:22:44 AM CET 2026: status=OB DISCHRG charge=96% runtime=2160s
Sun Jan 11 12:23:44 AM CET 2026: status=OB DISCHRG charge=93% runtime=2092s
Sun Jan 11 12:24:44 AM CET 2026: status=OB DISCHRG charge=91% runtime=1592s
Sun Jan 11 12:25:44 AM CET 2026: status=OB DISCHRG charge=87% runtime=1522s
Sun Jan 11 12:26:44 AM CET 2026: status=OB DISCHRG charge=83% runtime=1660s
Sun Jan 11 12:27:44 AM CET 2026: status=OB DISCHRG charge=79% runtime=1540s
Sun Jan 11 12:28:44 AM CET 2026: status=OB DISCHRG charge=75% runtime=1368s
Sun Jan 11 12:29:44 AM CET 2026: status=OB DISCHRG charge=71% runtime=1384s
Sun Jan 11 12:30:44 AM CET 2026: status=OB DISCHRG charge=66% runtime=1254s
Sun Jan 11 12:31:44 AM CET 2026: status=OB DISCHRG charge=61% runtime=1204s
Sun Jan 11 12:32:44 AM CET 2026: status=OB DISCHRG charge=58% runtime=1102s
Sun Jan 11 12:33:44 AM CET 2026: status=OB DISCHRG charge=54% runtime=1053s
Sun Jan 11 12:34:44 AM CET 2026: status=OB DISCHRG charge=49% runtime=943s
Sun Jan 11 12:35:44 AM CET 2026: status=OL CHRG charge=47% runtime=916s

Controlled shutdown and boot again

  • We run as planned, plugging a lamp to the UPS to also see visually how the UPS shutsdown.
  • Lesson learned: the UPS doesn't just shutdown the schuko where the server was plugged. It shutsdown the entire UPS device. Once you plug to main power, the UPS starts again (and eventually, the server once it picks up power).
  • Total runtime until shutdown was of 29 minutes.
  • Wake on power worked fine.
  • If the UPS sound alarm is active, the UPS shutdown is extremely noisy. Once it has one minute left to shut itself down, it beeps on every second.
  • Runtime readings keep being quite unstable, but as the battery drains the variance decreases. The UPS went for server shutdown finally at 13% charge and 286s of runtime left.
  • Charge readings suddenly change drastically when you plug/unplug the UPS from main. After the UPS shutdown (at 13% charge), I plugged main back and suddenly it was reading 40% within one minute. I unplugged from main again a couple of minutes later and it read 21% charge, and 20 seconds after it read 14%.

Side quests

  • What is the story of NUT. Who maintains it. Where's the code hosted.
    • NUT (Network UPS Tools) started in the late 1990s. It's open source, community-maintained, and hosted at https://github.com/networkupstools/nut. It's the de-facto standard for UPS management on Linux/Unix.
  • About using port = auto: how does linux find out which device is the UPS?
    • Linux identifies USB devices by vendor ID and product ID via the USB subsystem. When you plug in the UPS, it registers as a USB HID device. NUT's usbhid-ups driver scans connected USB devices looking for known UPS vendor/product ID combinations. "auto" tells it to scan and find the match automatically.
  • About the "low battery" status: how does the Cyberpower UPS decide? What's the criteria to be in that status? What are other statuses?
    • The UPS itself determines "low battery" based on internal logic—typically when remaining runtime drops below ~2 minutes or battery charge falls below ~20% (varies by model, sometimes configurable on the UPS). Other statuses include: OL (online/on mains), OB (on battery), LB (low battery), RB (replace battery), CHRG (charging), DISCHRG (discharging), BYPASS, CAL (calibrating), OFF, OVER (overloaded), TRIM, BOOST.
  • Does NUT allow to query the state of the UPS more granularly? How can that be done? What info is shared?
    • Yes. Use upsc <upsname> to see all variables the UPS reports: battery charge %, estimated runtime, input/output voltage, load %, temperature, etc. Use upscmd -l <upsname> to list available commands (like beeper toggle, battery test). What's available depends on what your specific UPS exposes over USB.
  • How can I monitor that the UPS is properly plugged in?
    • Run upsc <upsname>—if it returns data, connection is good. Check service status with systemctl status nut-driver nut-server. NUT can also send notifications via NOTIFYCMD when communication is lost (COMMBAD) or restored (COMMOK). For dashboards, you can use nut_exporter for Prometheus/Grafana integration.
  • What is the difference between battery charge % and load % metrics provided by upsc?
    • Battery charge % (battery.charge): How full the battery is—100% means fully charged, 0% means empty. Load % (ups.load): How much of the UPS's output capacity is currently being used. If your 540W UPS is powering 270W of devices, load is ~50%. They're independent: you can have 100% charge with 80% load, or 20% charge with 10% load.
  • What's better, that I signal Nodito to shutdown on a certain battery level, or on a certain remaining uptime? I would rather ensure graceful shutdown than extend uptime another minute.
    • Remaining runtime is better for your goal because it accounts for actual load—50% battery at high load might mean 2 minutes, while 50% at low load might mean 10 minutes. However, runtime estimates can be inaccurate on consumer UPS units. Safest approach: just trust the UPS's built-in LB (low battery) flag, which is exactly what NUT's default upsmon does. It's designed to leave enough time for graceful shutdown. If you want extra margin, you can trigger on battery.runtime < 180 seconds (3 min) instead.
  • How can I set things up so that, after a low battery and server shutdown, once the UPS starts getting power again, the server also starts again automatically? The server BIOS is set to boot on power coming back.
    • Your BIOS "restore on AC loss" setting handles the server side. For the UPS side: NUT's default behavior just shuts down the OS, not the UPS itself—the UPS keeps running and will stay on when mains returns. Your CyberPower will automatically restore output when power comes back. The only gotcha: if you configure NUT to send a shutdown command to the UPS itself (via POWERDOWNFLAG/upsdrvctl), make sure "auto-restart on AC restore" is enabled on the UPS (usually the default). With your BIOS set correctly, the chain is: power returns → UPS restores output → server sees power → BIOS boots.
  • Who controls who? Does the UPS tell the server to shutdown, or does the server decide?
    • The server monitors the UPS and decides—the UPS is passive. The UPS just continuously reports its state over USB (battery %, on-battery vs on-mains, low battery flag, etc.). NUT polls this data, and upsmon watches for conditions (like the LB flag) and decides to run the shutdown command. The UPS doesn't "tell" the server anything—it just answers status queries. The only command that goes TO the UPS is optional: after shutdown, NUT can tell the UPS "cut your outlet power."
  • But wait: if NUT runs on the server, how can it send a command to the UPS after the server shuts down?
    • It can't send anything after shutdown—the trick is timing. The command is sent during shutdown, but the UPS delays acting on it. The sequence: (1) upsmon detects low battery and initiates system shutdown. (2) Late in the shutdown process (but before fully off), a shutdown script calls upsdrvctl shutdown. (3) This tells the UPS "cut power in X seconds"—the UPS has an internal delay timer. (4) Server finishes shutting down. (5) UPS waits out its timer, then cuts outlet power. This is what POWERDOWNFLAG in upsmon.conf is for—it creates a flag file that late-stage shutdown scripts check, and if present, they call upsdrvctl shutdown before the system halts.
  • Why do we need the UPS to cut outlet power at all?
    • To enable auto-restart when mains returns. Consider: power goes out → UPS uses battery → battery gets low → server shuts down → but power comes back before UPS is completely drained. The server's power supply never actually lost power (the UPS kept feeding it throughout), so "restore on AC loss" in BIOS never triggers—the server stays off. By having the UPS cut outlet power after server shutdown, the server's PSU sees a power loss event. Then when mains returns and UPS restores outlets, the server sees "power restored" and BIOS boots it.
  • Does the UPS automatically restore outlet power when mains returns after a commanded power cut?
    • Yes. Most UPS units (including CyberPower) have "auto-restart" enabled by default. When mains returns: (1) UPS detects mains power. (2) UPS restores outlet power (either immediately, or after battery reaches a minimum charge—depends on UPS settings). (3) Server sees power → BIOS "restore on AC loss" kicks in → server boots. Some UPS units let you configure this behavior (e.g., "wait until battery is 20% charged before restoring outlets"), but out of the box it should just work.
  • Do I need to write a custom script to handle shutdown when the UPS battery is low?
    • No. The upsmon daemon handles this automatically. It runs constantly in the background, polling the UPS status every few seconds (configured via POLLFREQ). By default, it watches for the LB (Low Battery) flag—when the UPS decides its battery is critically low, it sets this flag, and upsmon sees it and runs SHUTDOWNCMD. The whole point of NUT is that this logic is built-in. Your custom heartbeat script is only for Uptime Kuma notifications; it has nothing to do with shutdown orchestration.
  • The UPS has ethernet surge protection pass-through. Why am I not using it?
    • My WAN connection is FTTH (Fiber To The Home). The cable from the wall to my router is fiber optic with an SC/APC connector (round, smaller than RJ45, with a ceramic ferrule inside)—not copper ethernet. Fiber carries light, not electricity, so it's inherently immune to electrical surges from lightning or power line disturbances. The ethernet surge protection on a UPS is designed for copper cables that run outside the building or between different electrical zones. My only ethernet cable is the LAN connection between the router and Nodito, which is entirely internal, and both devices are plugged into the same UPS anyway. If a surge hit my home's electrical system, both devices would experience it through their power supplies—the ethernet path between them isn't a meaningful risk vector. So the pass-through provides no practical benefit in my setup.