galoy-personal-notes/dagster-migration-plan.md
2025-10-24 14:33:36 +02:00

4.6 KiB

End goals

  • Dagster is part of deployment, gets created in staging with the full E2E data pipeline running
  • With schedules/automations
  • And is reachable from Lana UI for reports
  • Locally we can raise containers with make dev-up

Milestones

  • Starting point: no dagster
  • Dagster is deployable in staging and locally
    • Dagster gets included in the set of containers that get added to the kubernetes namespace in deployment
    • The dagster webserver UI in staging is reachable locally thorugh tunneling
    • We can load a dummy code location, but still no code from our own project
    • We also include dagster in local make dev-up
  • We can build a lana dagster image with our project code and deploy it
    • We build a hello-world grade docker image for dagster code automatically in CI in lana-bank, which gets added to our container registry
    • CI bumps this image in our helm charts and gets deployed to staging automatically
  • Dagster takes care of EL
    • We swap the responsibility of doing EL from lana core-pg from Meltano to Dagster
    • This will require:
      • Setting up EL in dagster
      • Adjusting the dbt project's staging layer to stop relying on Meltano fields
    • While this is in the works, we will need a code freeze in staging
  • Dagster takes care of dbt execution
    • We swap the responsibility of materializing the dbt DAG from Meltano to Dagster
    • While this is in the works, we will need a code freeze in the dbt project
  • Dagster can generate file reports
    • We integrate generate-es-reports in Dagster so that it can generate report files
    • But we don't plug it into Lana's UI just yet
    • At this point we begin a code freeze in generate-es-reports
  • Extract report files API out of Airflow
    • We set up an indepent microservice to handle the request and delivering of report files.
    • Same behaviour as what airflow flask plugin is doing today
    • Internally, interactions with the bucket contents remain the same. The features regarding requesting and monitoring file creation must be repointed from Airflow to Dagster
    • At this point we finish the code freeze in generate-es-reports
  • Add E2E testing
    • At this stage the whole pipeline is running on dagster. Right time to include tests to automate checking that everything runs
  • Cleanup
    • Remove any remaining old Meltano/Airflow references, code, env vars, etc. throughout our repositories

Other

How to add dagster to deployment?

Understanding how we add Airflow now

  • I'm going to check how Airflow is currently set up.
  • From what I understand, the right terraform bits should be spread around galoy-private-charts and galoy-deployments. Let's see what I can find.
  • Okay, some notes on galoy-private-charts and the relationship to `galoy-deployments.:
    • This repo is a Helm charts factory. We build a chart to deploy a lana-bank instance here.
    • The repo defines the chart, then CI tries to check that the chart is deployable with the testflight CI job (lana-bank-testflight). If testflight succeeds, another CI job (bump-lana-bank-in-deployments) updates the chart automatically in galoy-deployments with a bot commit.
    • galoy-private-charts
    • Also note that some of the images used in this chart come from upstream deps. Basically, code repos like lana-bank build their own images, which get added into our container registry and then referenced from galoy-private-charts. The bumps from lana-bank to galoy-private-charts happen through CI automated commits.

How to add dagster

I'm going to install minikube (kubectl and helm are already provided by nix in galoy-private-charts) locally to try to run the helm charts locally. I don't want to have to do full tours through CI and possibly break testflight to add dagster.

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
minikube start --driver=docker
kubectl get nodes
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

I discuss with Kartik on the daily call and he points out to this old Blink guide on how to set up a local kubernetes testing env: https://github.com/blinkbitcoin/charts/blob/main/dev/README.md

How to add dagster to make dev-up?