galoy-personal-notes/dagster-migration-plan.md
2025-10-24 14:33:36 +02:00

84 lines
No EOL
4.6 KiB
Markdown

## End goals
- Dagster is part of deployment, gets created in staging with the full E2E data pipeline running
- With schedules/automations
- And is reachable from Lana UI for reports
- Locally we can raise containers with `make dev-up`
## Milestones
- Starting point: no dagster
- Dagster is deployable in staging and locally
- Dagster gets included in the set of containers that get added to the kubernetes namespace in deployment
- The dagster webserver UI in staging is reachable locally thorugh tunneling
- We can load a dummy code location, but still no code from our own project
- We also include dagster in local `make dev-up`
- We can build a lana dagster image with our project code and deploy it
- We build a hello-world grade docker image for dagster code automatically in CI in `lana-bank`, which gets added to our container registry
- CI bumps this image in our helm charts and gets deployed to staging automatically
- Dagster takes care of EL
- We swap the responsibility of doing EL from lana core-pg from Meltano to Dagster
- This will require:
- Setting up EL in dagster
- Adjusting the dbt project's `staging` layer to stop relying on Meltano fields
- While this is in the works, we will need a code freeze in staging
- Dagster takes care of dbt execution
- We swap the responsibility of materializing the dbt DAG from Meltano to Dagster
- While this is in the works, we will need a code freeze in the dbt project
- Dagster can generate file reports
- We integrate `generate-es-reports` in Dagster so that it can generate report files
- But we don't plug it into Lana's UI just yet
- At this point we begin a code freeze in `generate-es-reports`
- Extract report files API out of Airflow
- We set up an indepent microservice to handle the request and delivering of report files.
- Same behaviour as what airflow flask plugin is doing today
- Internally, interactions with the bucket contents remain the same. The features regarding requesting and monitoring file creation must be repointed from Airflow to Dagster
- At this point we finish the code freeze in `generate-es-reports`
- Add E2E testing
- At this stage the whole pipeline is running on dagster. Right time to include tests to automate checking that everything runs
- Cleanup
- Remove any remaining old Meltano/Airflow references, code, env vars, etc. throughout our repositories
## Other
### How to add dagster to deployment?
#### Understanding how we add Airflow now
- I'm going to check how Airflow is currently set up.
- From what I understand, the right terraform bits should be spread around `galoy-private-charts` and `galoy-deployments`. Let's see what I can find.
- Okay, some notes on `galoy-private-charts` and the relationship to `galoy-deployments.:
- This repo is a Helm charts factory. We build a chart to deploy a lana-bank instance here.
- The repo defines the chart, then CI tries to check that the chart is deployable with the testflight CI job (`lana-bank-testflight`). If testflight succeeds, another CI job (`bump-lana-bank-in-deployments`) updates the chart automatically in `galoy-deployments` with a bot commit.
- `galoy-private-charts`
- Also note that some of the images used in this chart come from upstream deps. Basically, code repos like `lana-bank` build their own images, which get added into our container registry and then referenced from `galoy-private-charts`. The bumps from `lana-bank` to `galoy-private-charts` happen through CI automated commits.
#### How to add dagster
- We surely should rely on the provided helm charts to stick to their recommendations
- https://docs.dagster.io/deployment/oss/deployment-options/kubernetes/deploying-to-kubernetes
- We will need to build our own code location container and upload to our container registry. Kind of what we're currently doing with the meltano image
I'm going to install minikube (kubectl and helm are already provided by nix in `galoy-private-charts`) locally to try to run the helm charts locally. I don't want to have to do full tours through CI and possibly break testflight to add dagster.
```
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
minikube start --driver=docker
kubectl get nodes
```
```
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
```
I discuss with Kartik on the daily call and he points out to this old Blink guide on how to set up a local kubernetes testing env: https://github.com/blinkbitcoin/charts/blob/main/dev/README.md
### How to add dagster to make dev-up?