galoy-personal-notes/dagster-migration-plan.md at 18222dd2bf818651b0bf5fda9d405433632bb272

counterweight/galoy-personal-notes

Fork 0

pablo 18222dd2bf

stuf

2025-10-24 14:33:36 +02:00

4.6 KiB

Raw Blame History

End goals

Dagster is part of deployment, gets created in staging with the full E2E data pipeline running
With schedules/automations
And is reachable from Lana UI for reports
Locally we can raise containers with make dev-up

Milestones

Starting point: no dagster
Dagster is deployable in staging and locally
- Dagster gets included in the set of containers that get added to the kubernetes namespace in deployment
- The dagster webserver UI in staging is reachable locally thorugh tunneling
- We can load a dummy code location, but still no code from our own project
- We also include dagster in local make dev-up
We can build a lana dagster image with our project code and deploy it
- We build a hello-world grade docker image for dagster code automatically in CI in lana-bank, which gets added to our container registry
- CI bumps this image in our helm charts and gets deployed to staging automatically
Dagster takes care of EL
- We swap the responsibility of doing EL from lana core-pg from Meltano to Dagster
- This will require:
  - Setting up EL in dagster
  - Adjusting the dbt project's staging layer to stop relying on Meltano fields
- While this is in the works, we will need a code freeze in staging
Dagster takes care of dbt execution
- We swap the responsibility of materializing the dbt DAG from Meltano to Dagster
- While this is in the works, we will need a code freeze in the dbt project
Dagster can generate file reports
- We integrate generate-es-reports in Dagster so that it can generate report files
- But we don't plug it into Lana's UI just yet
- At this point we begin a code freeze in generate-es-reports
Extract report files API out of Airflow
- We set up an indepent microservice to handle the request and delivering of report files.
- Same behaviour as what airflow flask plugin is doing today
- Internally, interactions with the bucket contents remain the same. The features regarding requesting and monitoring file creation must be repointed from Airflow to Dagster
- At this point we finish the code freeze in generate-es-reports
Add E2E testing
- At this stage the whole pipeline is running on dagster. Right time to include tests to automate checking that everything runs
Cleanup
- Remove any remaining old Meltano/Airflow references, code, env vars, etc. throughout our repositories

Other

How to add dagster to deployment?

Understanding how we add Airflow now

I'm going to check how Airflow is currently set up.
From what I understand, the right terraform bits should be spread around galoy-private-charts and galoy-deployments. Let's see what I can find.
Okay, some notes on galoy-private-charts and the relationship to `galoy-deployments.:
- This repo is a Helm charts factory. We build a chart to deploy a lana-bank instance here.
- The repo defines the chart, then CI tries to check that the chart is deployable with the testflight CI job (lana-bank-testflight). If testflight succeeds, another CI job (bump-lana-bank-in-deployments) updates the chart automatically in galoy-deployments with a bot commit.
- galoy-private-charts
- Also note that some of the images used in this chart come from upstream deps. Basically, code repos like lana-bank build their own images, which get added into our container registry and then referenced from galoy-private-charts. The bumps from lana-bank to galoy-private-charts happen through CI automated commits.

How to add dagster

We surely should rely on the provided helm charts to stick to their recommendations
- https://docs.dagster.io/deployment/oss/deployment-options/kubernetes/deploying-to-kubernetes
We will need to build our own code location container and upload to our container registry. Kind of what we're currently doing with the meltano image

I'm going to install minikube (kubectl and helm are already provided by nix in galoy-private-charts) locally to try to run the helm charts locally. I don't want to have to do full tours through CI and possibly break testflight to add dagster.

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
minikube start --driver=docker
kubectl get nodes

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

I discuss with Kartik on the daily call and he points out to this old Blink guide on how to set up a local kubernetes testing env: https://github.com/blinkbitcoin/charts/blob/main/dev/README.md

4.6 KiB Raw Blame History