Compare commits

..

No commits in common. "3f4400cc03c944bee31abe9bafb185ddf991f13a" and "5a51c2e89b5acff1d23767fd4bb4a380fe7bdc6d" have entirely different histories.

9 changed files with 657 additions and 199 deletions

View file

@ -0,0 +1,364 @@
# AGAPITO1 Replacement Runbook
Continuation of [20260208_second_zfs_degradation.md](20260208_second_zfs_degradation.md).
AGAPITO1 (`ata-ST4000NT001-3M2101_WX11TN0Z`) had a failing SATA PHY and was RMA'd. The ZFS mirror `proxmox-tank-1` has been running degraded on AGAPITO2 alone since Feb 8. The replacement drive (same model, serial `WX120LHQ`) needs to be physically installed and added to the mirror.
**Current state:**
- Pool: `proxmox-tank-1` (mirror-0), DEGRADED
- AGAPITO2 (`WX11TN2P`): ONLINE, on ata4
- Old AGAPITO1 (`WX11TN0Z`): shows REMOVED in pool config
- Physical: drive bay empty, SATA data + power cables still connected to mobo/PSU (should be ata3 port after the cable swap from incident 2)
- New drive: ST4000NT001-3M2101, serial `WX120LHQ`
---
## Phase 1: Pre-shutdown state capture
While server is still running, log current state for reference.
- [x] **1.1** Record pool status
```
zpool status -v proxmox-tank-1
```
Expected: DEGRADED, WX11TN0Z shows REMOVED, WX11TN2P ONLINE.
```
pool: proxmox-tank-1
state: DEGRADED
status: One or more devices have been removed.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 06:55:06 with 0 errors on Tue Feb 17 20:40:50 2026
config:
NAME STATE READ WRITE CKSUM
proxmox-tank-1 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-ST4000NT001-3M2101_WX11TN0Z REMOVED 0 0 0
ata-ST4000NT001-3M2101_WX11TN2P ONLINE 0 0 0
errors: No known data errors
```
- [x] **1.2** Record current SATA layout
```
dmesg -T | grep -E 'ata[0-9]+\.[0-9]+: ATA-|ata[0-9]+: SATA link up' | tail -20
```
Expected: AGAPITO2 visible on ata4. ata3 should show nothing (empty slot).
```
[Tue Feb 17 15:37:28 2026] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Tue Feb 17 15:37:28 2026] ata4.00: ATA-11: ST4000NT001-3M2101, EN01, max UDMA/133
```
- [x] **1.3** Confirm AGAPITO2 is healthy before we start
```
smartctl -H /dev/disk/by-id/ata-ST4000NT001-3M2101_WX11TN2P
```
Expected: PASSED. If not, stop and investigate before proceeding.
```
SMART overall-health self-assessment test result: PASSED
```
---
## Phase 2: Graceful shutdown
- [x] **2.1** Shut down all VMs gracefully from Proxmox UI or CLI
```
qm list
# For each running VM:
qm shutdown <VMID>
```
- [x] **2.2** Verify all VMs are stopped
```
qm list
```
Expected: all show "stopped".
- [x] **2.3** Power down the server
```
shutdown -h now
```
---
## Phase 3: Physical installation
- [x] **3.1** Open the case
- [x] **3.2** Locate the dangling SATA data + power cables (from the old AGAPITO1 slot)
- [x] **3.3** Visually inspect cables for damage — especially the SATA data connector pins
- [x] **3.4** Label the new drive as TOMMY with a marker/sticker. Write serial `WX120LHQ` on the label too.
- [x] **3.5** Seat the new drive in the bay
- [x] **3.6** Connect SATA data cable to the drive — push firmly until it clicks
- [x] **3.7** Connect SATA power cable to the drive — push firmly
- [x] **3.8** Double-check both connectors are fully seated (wiggle test — they shouldn't move)
- [x] **3.9** Close the case
---
## Phase 4: Boot and verify detection
- [x] **4.1** Power on the server, let it boot into Proxmox
- [x] **4.2** Verify the new drive is detected by the kernel
```
dmesg -T | grep -E 'ata[0-9]+\.[0-9]+: ATA-|ata[0-9]+: SATA link up'
```
Expected: new drive detected on ata3 (or whichever port the cable is on), at 6.0 Gbps.
```
[Fri Feb 20 22:57:06 2026] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Fri Feb 20 22:57:06 2026] ata3.00: ATA-11: ST4000NT001-3M2101, EN01, max UDMA/133
[Fri Feb 20 22:57:07 2026] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Fri Feb 20 22:57:07 2026] ata4.00: ATA-11: ST4000NT001-3M2101, EN01, max UDMA/133
```
TOMMY on ata3, AGAPITO2 on ata4. Both at 6.0 Gbps, firmware EN01.
- [x] **4.3** Verify the drive appears in `/dev/disk/by-id/`
```
ls -l /dev/disk/by-id/ | grep WX120LHQ
```
Expected: `ata-ST4000NT001-3M2101_WX120LHQ` pointing to some `/dev/sdX`.
```
ata-ST4000NT001-3M2101_WX120LHQ -> ../../sda
```
- [ ] **4.4** Set variables for convenience
```
NEW_DISKID="ata-ST4000NT001-3M2101_WX120LHQ"
NEW_DISKPATH="/dev/disk/by-id/$NEW_DISKID"
OLD_DISKID="ata-ST4000NT001-3M2101_WX11TN0Z"
echo "New: $NEW_DISKID -> $(readlink -f $NEW_DISKPATH)"
```
- [x] **4.5** Confirm drive identity and firmware version with smartctl
```
smartctl -i "$NEW_DISKPATH"
```
Expected: Model ST4000NT001-3M2101, Serial WX120LHQ, Firmware EN01, 4TB capacity.
```
Device Model: ST4000NT001-3M2101
Serial Number: WX120LHQ
Firmware Version: EN01
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
```
- [x] **4.6** Check kernel logs are clean — no SATA errors, link drops, or speed downgrades
```
dmesg -T | grep -E 'ata[0-9]' | grep -iE 'error|fatal|reset|link down|slow|limiting'
```
Expected: nothing. If there are errors here on a brand new drive + known-good cable, **stop and investigate**.
```
[Fri Feb 20 22:57:06 2026] ata1: SATA link down (SStatus 0 SControl 300)
[Fri Feb 20 22:57:06 2026] ata2: SATA link down (SStatus 0 SControl 300)
```
Clean — ata1/ata2 are unused ports. No errors on ata3 or ata4.
---
## Phase 5: Health-check the new drive before trusting data to it
Don't resilver onto a DOA drive.
- [x] **5.1** SMART overall health
```
smartctl -H "$NEW_DISKPATH"
```
Expected: PASSED.
```
SMART overall-health self-assessment test result: PASSED
```
- [x] **5.2** Check SMART attributes baseline
```
smartctl -A "$NEW_DISKPATH" | grep -E 'Reallocated|Pending|Offline_Uncorrect|CRC|Error_Rate'
```
Expected: all counters at 0 (it's a new/refurb drive).
```
1 Raw_Read_Error_Rate ... - 6072
5 Reallocated_Sector_Ct ... - 0
7 Seek_Error_Rate ... - 476
197 Current_Pending_Sector ... - 0
198 Offline_Uncorrectable ... - 0
199 UDMA_CRC_Error_Count ... - 0
```
All critical counters at 0. Read/Seek error rate raw values are normal Seagate encoding.
- [x] **5.3** Run short self-test
```
smartctl -t short "$NEW_DISKPATH"
```
Wait ~2 minutes, then check:
```
smartctl -l selftest "$NEW_DISKPATH"
```
Expected: "Completed without error".
```
# 1 Short offline Completed without error 00% 0 -
```
Passed. 0 power-on hours — fresh drive.
- [x] **5.4** (Decision point) Short test passed. Proceeding.
---
## Phase 6: Add new drive to ZFS mirror
- [x] **6.1** Open a dedicated terminal for kernel log monitoring
```
dmesg -Tw
```
Leave this running throughout the resilver. Watch for ANY `ata` errors.
- [x] **6.2** Replace the old drive with the new one in the pool
```
zpool replace proxmox-tank-1 "$OLD_DISKID" "$NEW_DISKID"
```
This tells ZFS: "the REMOVED drive WX11TN0Z is being replaced by WX120LHQ". Resilvering starts automatically.
- [x] **6.3** Verify resilvering has started
```
zpool status -v proxmox-tank-1
```
Expected: state DEGRADED, new drive shows as part of a `replacing` vdev, resilver in progress.
```
resilver in progress since Fri Feb 20 23:10:58 2026
5.71G / 1.33T scanned at 344M/s, 0B / 1.33T issued
0B resilvered, 0.00% done
replacing-0 DEGRADED 0 0 0
ata-ST4000NT001-3M2101_WX11TN0Z REMOVED 0 0 0
ata-ST4000NT001-3M2101_WX120LHQ ONLINE 0 0 7.73K
ata-ST4000NT001-3M2101_WX11TN2P ONLINE 0 0 0
```
Resilver running. Cksum count on new drive is expected during resilver (unwritten blocks).
- [x] **6.4** Monitor resilver progress periodically
```
watch -n 30 "zpool status -v proxmox-tank-1"
```
Expected: steady progress, no read/write/cksum errors on either drive. Based on previous experience (~500GB at ~100MB/s with VMs down), expect roughly 1-2 hours.
VMs were auto-started on boot. Resilver completed: 1.34T in 03:32:55 with 0 errors.
- [x] **6.5** VMs were already running (auto-start on boot).
---
## Phase 7: Post-resilver verification
Wait for resilver to complete (status will say "resilvered XXG in HH:MM:SS with 0 errors").
- [x] **7.1** Check final pool status
```
zpool status -v proxmox-tank-1
```
Expected: ONLINE (or DEGRADED with "too many errors" message requiring a clear — same as last time).
```
state: ONLINE
scan: resilvered 1.34T in 03:32:55 with 0 errors on Sat Feb 21 02:43:53 2026
ata-ST4000NT001-3M2101_WX120LHQ ONLINE 0 0 7.73K
ata-ST4000NT001-3M2101_WX11TN2P ONLINE 0 0 0
```
ONLINE. 7.73K cksum on TOMMY is expected resilver artifact — clearing next.
- [x] **7.2** Clear residual cksum counters
```
zpool clear proxmox-tank-1
```
Counters cleared (status message and cksum count gone on re-check).
```
state: ONLINE
scan: resilvered 1.34T in 03:32:55 with 0 errors on Sat Feb 21 02:43:53 2026
ata-ST4000NT001-3M2101_WX120LHQ ONLINE 0 0 0
ata-ST4000NT001-3M2101_WX11TN2P ONLINE 0 0 0
errors: No known data errors
```
- [x] **7.3** Run a full scrub to verify data integrity
```
zpool scrub proxmox-tank-1
```
Expected: **0 errors on both drives**.
```
scrub repaired 0B in 03:27:50 with 0 errors on Sat Feb 21 11:38:02 2026
ata-ST4000NT001-3M2101_WX120LHQ ONLINE 0 0 0
ata-ST4000NT001-3M2101_WX11TN2P ONLINE 0 0 0
errors: No known data errors
```
- [x] **7.4** Clean status confirmed — 0B repaired, 0 errors, both drives 0/0/0.
- [x] **7.5** Baseline SMART snapshot of the new drive after heavy I/O
```
smartctl -x "$NEW_DISKPATH" | grep -E 'Reallocated|Pending|Offline_Uncorrect|CRC|Hardware Resets|COMRESET|Interface'
```
Expected: 0 reallocated, 0 CRC errors, low hardware reset count.
```
Reallocated_Sector_Ct ... 0
Current_Pending_Sector ... 0
Offline_Uncorrectable ... 0
UDMA_CRC_Error_Count ... 0
Number of Hardware Resets ... 2
Number of Interface CRC Errors ... 0
COMRESET ... 2
```
All clean. 2 hardware resets / COMRESETs from boot — normal.
- ~~**7.6**~~ Skipped — extended SMART self-test is redundant after a clean resilver + scrub. ZFS checksums already verified every data block; the only thing the long test would cover is empty space that ZFS hasn't written to, which ZFS will verify on future use anyway.
---
## Phase 8: Final state — done
- [x] **8.1** Final pool status — already captured in 7.4. Mirror is healthy:
```
pool: proxmox-tank-1
state: ONLINE
scan: scrub repaired 0B in 03:27:50 with 0 errors on Sat Feb 21 11:38:02 2026
config:
NAME STATE READ WRITE CKSUM
proxmox-tank-1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST4000NT001-3M2101_WX120LHQ ONLINE 0 0 0
ata-ST4000NT001-3M2101_WX11TN2P ONLINE 0 0 0
errors: No known data errors
```
- [x] **8.2** All VMs running normally — verified from Proxmox UI
- [x] **8.3** Celebrate. Mirror is whole again.
---
## Abort conditions
Stop and investigate if any of these happen:
- New drive not detected after boot (bad seating or DOA)
- SATA errors in `dmesg` during or after boot (bad cable? bad drive?)
- SMART short test fails on new drive (DOA — contact seller)
- Resilver stalls or produces errors on the new drive
- Scrub finds checksum errors on the new drive
---
## Execution summary
Executed 2026-02-20 evening through 2026-02-21 morning. No abort conditions hit — completely clean run.
- TOMMY (`WX120LHQ`) installed on ata3 at 6.0 Gbps, detected first boot
- SMART short test passed, all critical attributes at zero
- Resilver: 1.34T in 03:32:55, 0 errors (VMs were running — auto-start on boot)
- Scrub: repaired 0B in 03:27:50, 0 errors, both drives 0/0/0
- Post-I/O SMART baseline clean: 0 reallocated, 0 CRC errors
- Extended SMART test skipped — redundant after clean resilver + scrub (ZFS checksums already verified all data blocks)
- Pool `proxmox-tank-1` fully healthy. Mirror degradation that started 2026-02-08 is resolved.

292
dbt_model_optimization.md Normal file
View file

@ -0,0 +1,292 @@
# Busy mans guide to optimizing dbt models performance
You have a `dbt` model that takes ages to run in production. For some very valid reason, this is a problem.
This is a small reference guide on things you can try. I suggest you try them from start to end, since they are sorted in a descending way by value/complexity ratio.
Before you start working on a model, you might want to check [the bonus guide at the bottom](Busy%20man%E2%80%99s%20guide%20to%20optimizing%20dbt%20models%20performa%20b0540bf8fa0a4ca5a6220b9d8132800d.md) to learn how to make sure you dont change the outputs of a model while refactoring it.
If youve tried everything you could here and things still dont work, dont hesitate to call Pablo.
## 1. Is your model *really* taking too long?
> Before you optimize a model that is taking too long, make sure it actually takes too long.
>
The very first step is to really assess if you do have a problem.
We run our DWH in a Postgres server, and Postgres is a complex system. Postgres is doing many things at all times and its very stateful, which means you will pretty much never see *exactly* the same performance twice for some given query.
Before going crazy optimizing, I would advice running the model or the entire project a few times and observing the behaviour. It might be that *some day* it took very long for some reason, but usually, it runs just fine.
You also might want to do this in a moment where theres little activity in the DWH, like very early or late in the day, so that other users activity in the DWH dont pollute your observations.
If this is a model that is already being run regularly already, we can also leverage the statistics collected by the `pg_stat_statements` Postgres extension to check what are the min, avg, and max run times for it. Ask Pablo to get this.
## 2. Reducing the amount of data
> Make your query only bring in the data it needs, and not more. Reduce the amount of data as early as possible.
>
This option is a simple optimization trick that can be used in many areas and its easy to pull off.
The two holy devils of slow queries are large amounts of data and monster lookups/sorts. Both can be drastically reduced by simply reducing the amount of data that goes into the query, typically by applying some smart `WHERE` or creative conditions on a `JOIN` clause. This can be either done in your basic CTEs where you read from other models, or in the main `SELECT` of your model.
Typically, try to make this as *early* as possible in the model. Early here refers to the steps of your query. In your queries, you will typically:
- read a few tables,
- do some `SELECTs`
- then do more crazy logic downstream with more `SELECTs`
- and the party goes on for as long and complex your case is
Reducing the amount of data at the end is pointless. You will still need to read a lot of stuff early and have monster `JOIN`s , window functions, `DISTINCTs`, etc. Ideally, you want to do it when your first access an upstream table. If not there, then as early as possible within the logic.
The specifics of how to apply this are absolutely query dependent, so I cant give you magic instructions for the query you have at hand. But let me illustrate the concept with an example:
### Only hosts? Then only hosts
You have a table `stg_my_table` with a lot of data, lets say 100 million records, and each record has the id of a host. In your model, you need to join these records with the host user data to get some columns from there. So right now your query looks something like this (tables fictional, this is not how things look in DWH):
```sql
with
stg_my_table as (select * from {{ ref("stg_my_table") }}),
stg_users as (select * from {{ ref("stg_users")}})
select
...
from stg_my_table t
left join
stg_users u
on t.id_host_user = id_user
```
At the time Im writing this, the real user table in our DWH has like 600,000 records. This means that:
- The CTE `stg_users` will need to fetch 600,000 records, with all their data, and store them.
- Then the left join will have to join 100 million records from `my_table` with the 600,000 user records.
Now, this is not working for you because it takes ages. We can easily improve the situation by applying the principle of this section: reducing the amount of data.
Our user table in the DWH has both hosts and guests. Actually, it has a ~1,000 hosts and everything else is just guests. This means that:
- Were fetching around 599,000 guest details that we dont care about at all.
- Every time we join a record from `my_table`, we do so against 600,000 user records when we only truly care about 1,000 of them.
Stupid, isnt it?
Well, imagining that our fictional `stg_users` tables had a field called `is_host`, we can rewrite the query this way to get exactly the same result in only a fraction of the time:
```sql
with
stg_my_table as (select * from {{ ref("stg_my_table") }}),
**stg_users as (
select *
from {{ ref("stg_users")}}
where is_host = true
)**
select
...
from stg_my_table t
left join
stg_users u
on t.id_host_user = id_user
```
Its simple to understand: the CTE will now only get the 1,000 records related to hosts, which means we save performance in both fetching that data and having a much smaller join operation downstream against `stg_my_table`.
## 3. Controlling CTE materialization
> Tell Postgres when to cache intermediate results and when to optimize through them.
>
This one requires a tiny bit of understanding of what happens under the hood, but the payoff is big and the fix is easy to apply.
### What Postgres does with your CTEs
When Postgres runs a CTE, it has two strategies:
- **Materialized**: Postgres runs the CTE query, stores the full result in a temporary buffer, and every downstream reference reads from that buffer. Think of it as Postgres creating a temporary, index-less table with the CTE's output.
- **Not materialized**: Postgres treats the CTE as if it were a view. It doesn't store anything — instead, it folds the CTE's logic into the rest of the query and optimizes everything together. This means it can push filters down, use indexes from the original tables, and skip reading rows it doesn't need.
By default, Postgres decides for you: if a CTE is referenced once, it inlines it. If it's referenced more than once, it materializes it.
The problem is that this default isn't always ideal, especially with how we write dbt models.
### Why this matters for our dbt models
Following our conventions, we always import upstream refs as CTEs at the top of the file:
```sql
with
stg_users as (select * from {{ ref("stg_users") }}),
stg_bookings as (select * from {{ ref("stg_bookings") }}),
some_intermediate_logic as (
select ...
from stg_users
join stg_bookings on ...
where ...
),
some_other_logic as (
select ...
from stg_users
where ...
)
select ...
from some_intermediate_logic
join some_other_logic on ...
```
Notice that `stg_users` is referenced twice — once in `some_intermediate_logic` and once in `some_other_logic`. This means Postgres will materialize it by default. What happens then is:
1. Postgres scans the entire `stg_users` table and copies all 600,000 rows into a temporary buffer.
2. If the buffer exceeds available memory, it spills to disk.
3. Every downstream CTE that reads from `stg_users` does a sequential scan of that buffer. Note this means indices can't be used, even if the original table had them.
4. Any filters that downstream CTEs apply to `stg_users` (like `where is_host = true`) can't be pushed down to the original table scan. Postgres reads all 600,000 rows first, stores them, and only then filters.
All of that, for a `select *` that does absolutely no computation worth caching.
### The fix
You can explicitly control this behaviour by adding `MATERIALIZED` or `NOT MATERIALIZED` to any CTE:
```sql
with
stg_users as not materialized (select * from {{ ref("stg_users") }}),
stg_bookings as not materialized (select * from {{ ref("stg_bookings") }}),
some_intermediate_logic as (
...
),
some_other_logic as (
...
)
select ...
```
With `NOT MATERIALIZED`, Postgres treats those import CTEs as transparent aliases. It can see straight through to the original table, use its indexes, and push filters down.
### When to use which
The rule of thumb is simple:
- **Cheap CTE, referenced multiple times**`NOT MATERIALIZED`. This is the typical case for our import CTEs at the top of the file. There's no computation to cache, so materializing just wastes resources.
- **Expensive CTE, referenced multiple times** → leave it alone (or explicit `MATERIALIZED`). If a CTE does heavy aggregations, complex joins, or window functions, materializing means that work happens once. Without it, Postgres would repeat the expensive query every time the CTE is referenced.
- **Any CTE referenced only once** → doesn't matter. Postgres inlines it automatically.
If you're unsure whether a CTE is "expensive enough" to warrant materialization, just try both and measure. There's no shame in that.
## 4. Change upstream materializations
> Materialize upstream models as tables instead of views to reduce computation on the model at hand.
>
Going back to basics, dbt offers [multiple materializations strategies for our models](https://docs.getdbt.com/docs/build/materializations).
Typically, for reasons that we wont cover here, the preferred starting point is to use views. We only go for tables or incremental materializations if there are good reasons for this.
If you have a model that is having terrible performance, its possible that the fault doesnt sit at the model itself, but rather at an upstream model. Let me make an example.
Imagine we have a situation with three models:
- `stg_my_simple_model`: a model with super simple logic and small data
- `stg_my_crazy_model`: a model with a crazy complex query and lots of data
- `int_my_dependant_model`: an int model that reads from both previous models.
- Where the staging models are set to materialize as views and the int model is set to materialize as a table.
Because the two staging models are set to materialize as views, this means that every time you run `int_my_dependant_model`, you will also have to execute the queries of `stg_my_simple_model` and `stg_my_crazy_model`. If the upstream views model are fast, this is not an issue of any kind. But if a model is a heavy query, this could be an issue.
The point is, you might notice that `int_my_dependant_model` takes 600 seconds to run and think theres something wrong with it, when actually the fault sits at `stg_my_crazy_model`, which perhaps is taking 590 seconds out of the 600.
How can materializations solve this? Well, if `stg_my_crazy_model` was materialized as a table instead of as view, whenever you ran `int_my_dependant_model` you would simply read from a table with pre-populated results, instead of having to run the `stg_my_crazy_model` query each time. Typically, reading the results will be much faster than running the whole query. So, in summary, by making `stg_my_crazy_model` materialize as a table, you can fix your performance issue in `int_my_dependant_model`.
## 5. Switch the model to materialization to `incremental`
> Make the processing of the table happen in small batches instead of on all data to make it more manageable.
>
Imagine we want to count how many bookings where created each month.
As time passes, more and more months and more and more bookings appear in our history, making the size of this problem ever increasing. But then again, once a month has finished, we shouldnt need to go back and revisit history: whats done is done, and only the ongoing month is relevant, right?
[dbt offers a materialization strategy named](https://docs.getdbt.com/docs/build/incremental-models) `incremental`, which allows you to only work on a subset of data. this means that every time you run `dbt run` , your model only works on a certain part of the data, and not all of it. If the nature of your data and your needs allows isolating each run to a small part of all upstream data, this strategy can help wildly improve the performance.
Explaining the inner details of `incremental` goes beyond the scope of this page. You can check the official docs from `dbt` ([here](https://docs.getdbt.com/docs/build/incremental-models)), ask the team for support or check some of the incremental models that we already have in our project and use them as references.
Note that using `incremental` strategies makes life way harder than simple `view` or `table` ones, so only pick this up if its truly necessary. Dont make models incremental without trying other optimizations first, or simply because you realise that you *could* use it. in a specific model.
![dbts official docs (wisely) warning you of the dangers of incremental.](image%2039.png)
dbts official docs (wisely) warning you of the dangers of incremental.
## 6. End of the line: general optimization
The final tip is not really a tip. The above five things are the easy-peasy, low hanging fruit stuff that you can try. This doesnt mean that there isnt more than you can do, just that I dont know of more simple stuff that you can try without deep knowledge of how Postgres works beneath and a willingness to get your hands *real* dirty.
If youve reached this point and your model still performing poorly, you either need to put your Data Engineer hat on and really deepen your knowledge… or call Pablo.
## Bonus: how to make sure you didnt screw up and change the output of the model
The topic we are discussing in this guide is making refactors purely for the sake of performance, without changing the output of the given model. We simply want to make the model faster, not change what data it generates.
That being the case, and considering the complexity of the strategies weve presented here, being afraid that you messed up and accidentally changed the output of the model is a very reasonable fear to have. Thats a kind of mistake that we definitely want to avoid.
Doing this manually can be a PITA and very time consuming, which doesnt help at all.
To make your life easier, Im going to show you a new little trick.
### Hashing tables and comparing them
Ill post a snippet of code here that you can run to compare if any pair of tables has *exactly* the same contents. Emphasis on exactly. Changing the slightest bit of content will be detected.
```sql
SELECT md5(array_agg(md5((t1.*)::varchar))::varchar)
FROM (
SELECT *
FROM my_first_table
ORDER BY <whatever field is unique>
) AS t1
SELECT md5(array_agg(md5((t2.*)::varchar))::varchar)
FROM (
SELECT *
FROM my_second_table
ORDER BY <whatever field is unique>
) AS t2
```
How this works is: you execute the two queries, which will return a single value each. Some hexadecimal gibberish.
If the output of the two queries is identical, it means their contents are identical. If they are different, it means theres something different across both.
If you dont understand how this works, and you dont care, thats fine. Just use it.
If not knowing does bother, you should go down the rabbit holes of hash functions and deterministic serialization.
### Including this in your refactoring workflow
Right, now you know how to make sure that two tables are identical.
This is dramatically useful for your optimization workflow. You can know simply:
- Keep the original model
- Create a copy of it, which is the one you will be working on (the working copy)
- Prepare the magic query to check their contents are identical
- From this point on, you can enter in this loop for as long as you want/need:
- Run the magic query to ensure you start from same-output-state
- Modify the working copy model to attempt whatever optimization thingie you wanna try
- Once you are done, run the magic query again.
- If the output is not the same anymore, you screwed up. Start again and avoid whatever mistake you made.
- If the output is still the same, you didnt cause a change in the model output. Either keep on optimizing or call it day.
- Finally, just copy over the working copy model code into the old one and remove the working copy.
I hope that helps. I also recommend doing the loop as frequently as possible. The less things you change between executions of the magic query, the easier is to realize what caused errors if they appear.
![ Donald Knuth - "[StructuredProgrammingWithGoToStatements](http://web.archive.org/web/20130731202547/http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf)”](image%2040.png)
Donald Knuth - "[StructuredProgrammingWithGoToStatements](http://web.archive.org/web/20130731202547/http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf)”

View file

@ -1,44 +0,0 @@
I've recently started mining Bitcoin at a scale I never had before, so I thought it would be interesting to jot down a few observations on my recent errands.
My friend Unhosted Marcellus has been following closely the evolution of the [OCEAN mining pool](https://ocean.xyz) since its launch. I hadn't used it personally until recently, so for years all the info I had on it was secondhand. What he was most excited about was the [DATUM Gateway](https://ocean.xyz/docs/datum-setup): the great innovation is that you are building your own block templates, which is something no other pool does. By using OCEAN with DATUM, you enjoy the benefit of more stable mining rewards as opposed to lotto mining, while still being a sovereign miner in the sense that you rely on your own node and you do your own templating. Great news for decentralization.
The reason I had not bothered with setting all of this up so far was... that I really don't mine much. I got gifted a cute [Bitaxe Supra](https://bitronics.store/collections/bitaxe/products/bitaxe-supra) from the [Bitronics Shop](https://bitronics.store/) that produces some humble 600GH/s, so it felt pointless to do all the setup for such a tiny hashrate.
But then, Unhosted Marcellus started to tell me about these new markets started by Braiins called [Hashpower](https://hashpower.braiins.com/). Other articles explain the market better, so I'll leave it up to you to find those to learn about it. Although I must say, if learning is what you want, nothing beats using it. The TLDR is that you can sign up, send sats, and rent hashrate that you can point to your own DATUM gateway. And the surprise (at least for me) is how you can literally rent petahashes for peanuts, when you account for the fact that most of the sats you put towards buying hashrate will come back as mining rewards.
Unhosted finally triggered me with this tweet. Cheeky bastard.
I started toying around with a few PH/s, eventually trying out double digit petahashes. There is this funny feeling to suddenly be controlling the equivalent of tens of thousands of little bitaxes. [[insert many-phs.png here]]
The economics around it are interesting. The bidding prices in Hashpower are usually (not always!) above hashvalue. It's common to pay a 1%-5% premium over hashvalue. So, the most probable thing is that you end up operating at a small loss. This is not strictly guaranteed if you mine with OCEAN, since the luck factor is important and can easily swing rewards +-10%. So unless you mine at a stable rate with a months-long time horizon, luck is going to play a more important role than the premium on the hashrate.
To optimize your outcome, it is important to constantly update your bids in Hashpower. Bids are set at a fixed price in sats, so as the market auction moves every few seconds, you will be either overpaying or end up unserviced.
[[ order book bids here.png ]]
On the first days I was using Hashpower, I would log into it multiple times a day to adjust my bids to stay at the right height of the orderbook. At first it was fun, then it felt tedious, and it started to generate this Twitter-esque addiction feeling I didn't like. I quickly concluded I wanted to automate this out so my only task was to contemplate how pretty my OCEAN hashrate dashboard looked, and I could leave behind pulling levers in Hashpower's webpage like a financial monkey.
[[ insert dreaming-bids.jpeg here ]]
I solved this problem for myself with [hashbidder](https://github.com/counterweightoperator/hashbidder). It's a small CLI tool that I run every couple minutes with cron. The TLDR is you can give it a config file that reads "I want to mine at 5PH/s" and the tool will set your bids with two goals in mind:
- To bring your hashrate in line with your goal (e.g. if you want to be at 5PH/s, and you're currently averaging 3PH/s in the last 24 hours, it will drive your bids to a total of 7PH/s. If instead, you're scoring an average of 10PH/s, it will stop your bids completely to let your average go lower).
- To pay as little as possible, but guarantee you get served. The logic here is to set the price right above the cheapest bid that is being served currently.
The result is quite pleasant. Delivery is choppy because, even with frequent updates, trying to be cheap means you often get dragged into being overbid by others and you stay there for some minutes. But the self-adjusting hashrate compensates for it: if you've been falling behind a lot recently, hashbidder will just hash at a higher hashrate to make up for it. I'm currently targeting 5PH/s, and this is what my OCEAN hashrate timeline looks like.
[[ insert recent-hashrate.png ]]
There are still a few more optimizations I'll add to hashbidder to reduce cost and decrease the volatility of delivery, but they're just marginal improvements. The gist of it is already there and it's doing its work fine.
My next steps are simply to sit and watch. I've decided I will pour 10 million sats during a few months into this setup and then stop to measure what my rewards have totalled to, so I can provide people interested in this with a real-life report of how everything turned out.
Overall, I'm having lots of fun. Setting this up made me excited in a way that felt oddly similar to the first time I was setting up lightning nodes. The night I started up my DATUM gateway and pointed some hashrate to it felt like the night I spun up an LND and started doing some lightning triangles in [Lightning Network+](https://lightningnetwork.plus/).
Some interesting links in case you want to learn more or give it a shot at mining with rented hash yourself:
- [rentsomehash.com](https://rentsomehash.com/), guides on how to setup your DATUM gateway and start mining with rented hash
- A video guide from Matthew Kratter: https://x.com/mattkratter/status/2043692900190753089?s=20
- You can check what Unhosted tweets here, since he's pretty much obsessed with this and doesn't pay attention to anything else: https://x.com/oomahq. Also, some podcasts and articles from him. Many kudos for starting this fire:
- [Interesting tweet #1](https://x.com/oomahq/status/2042692591469367692)
- [Once Bitten! episode](https://fountain.fm/episode/zwmdkwdhy0jQT5dVkhah)
- [The Bitcoin Libertarian episode (in Spanish)](https://www.youtube.com/watch?v=k8ZRNyr3ofA)

View file

@ -22,7 +22,6 @@
<li><a href="#contact-header">Contact</a></li> <li><a href="#contact-header">Contact</a></li>
<li><a href="#my-projects-header">My projects</a></li> <li><a href="#my-projects-header">My projects</a></li>
<li><a href="#writings-header">Writings</a></li> <li><a href="#writings-header">Writings</a></li>
<li><a href="#talks-header">Talks</a></li>
</ul> </ul>
<hr /> <hr />
<section> <section>
@ -127,9 +126,6 @@
<li> <li>
<a href="https://bitcoininfra.contrapeso.xyz" target="_blank" rel="noopener noreferrer">My open access Bitcoin infrastructure that you can use freely.</a> It includes access to the peer port of my Bitcoin node, an Electrum server and a mempool.space instance. <a href="https://bitcoininfra.contrapeso.xyz" target="_blank" rel="noopener noreferrer">My open access Bitcoin infrastructure that you can use freely.</a> It includes access to the peer port of my Bitcoin node, an Electrum server and a mempool.space instance.
</li> </li>
<li>
<a href="https://github.com/counterweightoperator/hashbidder" target="_blank" rel="noopener noreferrer">hashbidder</a>, a CLI tool to automatically manage your bids on Braiins Hashpower to maintain a target hashrate at minimal cost.
</li>
</ul> </ul>
<p> <p>
There are also some other projects that I generally keep private but There are also some other projects that I generally keep private but
@ -151,10 +147,6 @@
<h2 id="writings-header">Writings</h2> <h2 id="writings-header">Writings</h2>
<p>Sometimes I like to jot down ideas and drop them here.</p> <p>Sometimes I like to jot down ideas and drop them here.</p>
<ul> <ul>
<li>
<a href="writings/my-first-petahash.html" target="_blank"
rel="noopener noreferrer">My first petahash</a>
</li>
<li> <li>
<a href="writings/my-fitness-journey.html" target="_blank" <a href="writings/my-fitness-journey.html" target="_blank"
rel="noopener noreferrer">My fitness journey</a> rel="noopener noreferrer">My fitness journey</a>
@ -245,25 +237,6 @@
</li> </li>
</ul> </ul>
</section> </section>
<hr />
<section>
<h2 id="talks-header">Talks</h2>
<p>Some talks I've given:</p>
<ul>
<li>
<a href="https://www.youtube.com/watch?v=-kQT2-C-Pgs" target="_blank"
rel="noopener noreferrer">¿Es posible la libertad sin desobediencia?</a>
</li>
<li>
<a href="https://www.youtube.com/watch?v=G9sk8fHrZ3Y&t=1360" target="_blank"
rel="noopener noreferrer">Lanzamiento BISQ 2.0 y introducción al KYC</a>
</li>
<li>
<a href="https://youtu.be/3GfkptA8Dgc?t=1629" target="_blank"
rel="noopener noreferrer">Lightning Network para negocios: retos y soluciones</a>
</li>
</ul>
</section>
</main> </main>
<footer> <footer>
<p>Pablo Martín Calvo</p> <p>Pablo Martín Calvo</p>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 208 KiB

View file

@ -1,127 +0,0 @@
<!DOCTYPE HTML>
<html>
<head>
<title>Pablo here</title>
<meta charset="utf-8">
<meta viewport="width=device-width, initial-scale=1">
<link rel="stylesheet" href="../styles.css">
</head>
<body>
<main>
<h1>
Hi, Pablo here
</h1>
<p><a href="../index.html">back to home</a></p>
<hr>
<section>
<h2>My first petahash</h2>
<p>I've recently started mining Bitcoin at a scale I never had before, so I thought it would be interesting
to jot down a few observations on my recent errands.</p>
<p>My friend Unhosted Marcellus has been following closely the evolution of the
<a href="https://ocean.xyz">OCEAN mining pool</a> since its launch. I hadn't used it personally until
recently, so for years all the info I had on it was second hand. What he was most excited about was the
<a href="https://ocean.xyz/docs/datum-setup">DATUM Gateway</a>: the great innovation is that you are
building your own block templates, which is something no other pool does. By using OCEAN with DATUM, you
enjoy the benefit of more stable mining rewards as opposed to lotto mining, while still being a sovereign
miner in the sense that you rely on your own node and you do your own templating. Great news for
decentralization.</p>
<p>The reason I had not bothered with setting all of this up so far was... that I really don't mine much. I
got gifted a cute <a href="https://bitronics.store/collections/bitaxe/products/bitaxe-supra">Bitaxe
Supra</a> from the <a href="https://bitronics.store/">Bitronics Shop</a> that produces some humble
600GH/s, so it felt pointless to do all the setup for such a tiny hashrate.</p>
<p>But then, Unhosted Marcellus started to tell me about these new markets started by Braiins called
<a href="https://hashpower.braiins.com/">Hashpower</a>. Other articles explain the market better, so
I'll leave it up to you to find those to learn about it. Although I must say, if learning is what you
want, nothing beats using it. The TLDR is that you can sign up, send sats, and rent hashrate that you
can point to your own DATUM gateway. And the surprise (at least for me) is how you can literally rent
petahashes for peanuts, when you account for the fact that most of the sats you put towards buying
hashrate will come back as mining rewards.</p>
<p>Unhosted finally triggered me with this tweet. Cheeky bastard.</p>
<p>I started toying around with a few PH/s, eventually trying out double digit petahashes. There is this
funny feeling to suddenly be controlling the equivalent of tens of thousands of little bitaxes.</p>
<figure style="width: 75%; margin: 10px auto;">
<img width="100%" height="auto" src="../static/many-phs.png" alt="Double digit petahashes on OCEAN">
<figcaption>Double digit petahashes on OCEAN</figcaption>
</figure>
<p>The economics around it are interesting. The bidding prices in Hashpower are usually (not always!) above
hashvalue. It's common to pay a 1%-5% premium over hashvalue. So, the most probable thing is that you
end up operating at a small loss. This is not strictly guaranteed if you mine with OCEAN, since the luck
factor is important and can easily swing rewards +-10%. So unless you mine at a stable rate with a
months-long time horizon, luck is going to play a more important role than the premium on the
hashrate.</p>
<p>To optimize your outcome, it is important to constantly update your bids in Hashpower. Bids are set at a
fixed price in sats, so as the market auction moves every few seconds, you will be either overpaying or
end up unserviced.</p>
<figure style="width: 75%; margin: 10px auto;">
<img width="100%" height="auto" src="../static/order-book-bids-here.png" alt="Hashpower order book">
<figcaption>Hashpower order book</figcaption>
</figure>
<p>On the first days I was using Hashpower, I would log into it multiple times a day to adjust my bids to
stay at the right height of the order book. At first it was fun, then it felt tedious, and it started to
generate this Twitter-esque addiction feeling I didn't like. I quickly concluded I wanted to automate
this out so my only task was to contemplate how pretty my OCEAN hashrate dashboard looked like, and I
could leave behind pulling levers in Hashpower's webpage like a financial monkey.</p>
<figure style="width: 75%; margin: 10px auto;">
<img width="100%" height="auto" src="../static/dreaming-bids.jpeg" alt="Dreaming of automated bids">
<figcaption>Dreaming of automated bids</figcaption>
</figure>
<p>I solved this problem for myself with
<a href="https://github.com/counterweightoperator/hashbidder">hashbidder</a>. It's a small CLI tool
that I run every couple minutes with cron. The TLDR is you can give it a config file that reads "I want
to mine at 5PH/s" and the tool will set your bids with two goals in mind:</p>
<ul>
<li>To bring your hashrate in line with your goal (e.g. if you want to be at 5PH/s, and you're
currently averaging 3PH/s in the last 24 hours, it will drive your bids to a total of 7PH/s. If
instead, you're scoring an average of 10PH/s, it will stop your bids completely to let your average
go lower).</li>
<li>To pay as little as possible, but guarantee you get served. The logic here is to set the price right
above the cheapest bid that is being served currently.</li>
</ul>
<p>The result is quite pleasant. Delivery is choppy because, even with frequent updates, trying to be cheap
means you often get dragged into being overbid by others and you stay there for some minutes. But the
self-adjusting hashrate compensates for it: if you've been falling behind a lot recently, hashbidder
will just hash at a higher hashrate to make up for it. I'm currently targeting 5PH/s, and this is what
my OCEAN hashrate timeline looks like.</p>
<figure style="width: 75%; margin: 10px auto;">
<img width="100%" height="auto" src="../static/recent-hashrate.png" alt="OCEAN hashrate timeline">
<figcaption>OCEAN hashrate timeline at ~5PH/s target</figcaption>
</figure>
<p>There are still a few more optimizations I'll add to hashbidder to reduce cost and decrease the
volatility of delivery, but they're just marginal improvements. The gist of it is already there and it's
doing its work fine.</p>
<p>My next steps are simply to sit and watch. I've decided I will pour 10 million sats during a few months
into this setup and then stop to measure what my rewards have totalled to, so I can provide people
interested in this with a real-life report of how everything turned out.</p>
<p>Overall, I'm having lots of fun. Setting this up made me excited in a way that felt oddly similar to the
first time I was setting up lightning nodes. The night I started out my DATUM gateway and pointed some
hashrate to it felt like the night I spun up an LND and started doing some lightning triangles in
<a href="https://lightningnetwork.plus/">Lightning Network+</a>.</p>
<p>Some interesting links in case you want to learn more or give it a shot at mining with rented hash
yourself:</p>
<ul>
<li><a href="https://rentsomehash.com/">rentsomehash.com</a>, guides on how to set up your DATUM
gateway and start mining with rented hash</li>
<li>A video guide from Matthew Kratter:
<a href="https://x.com/mattkratter/status/2043692900190753089">on X</a></li>
<li>You can check what Unhosted tweets here, since he's pretty much obsessed with this and doesn't pay
attention to anything else: <a href="https://x.com/oomahq">https://x.com/oomahq</a>. Also, some
podcasts and articles from him. Many kudos for starting this fire:
<ul>
<li><a href="https://x.com/oomahq/status/2042692591469367692">Interesting tweet #1</a></li>
<li><a href="https://fountain.fm/episode/zwmdkwdhy0jQT5dVkhah">Once Bitten! episode</a></li>
<li><a href="https://www.youtube.com/watch?v=k8ZRNyr3ofA">The Bitcoin Libertarian episode (in
Spanish)</a></li>
</ul>
</li>
</ul>
<hr>
<p><a href="../index.html">back to home</a></p>
</section>
</main>
</body>
</html>