4.6 KiB
4.6 KiB
End goals
- Dagster is part of deployment, gets created in staging with the full E2E data pipeline running
- With schedules/automations
- And is reachable from Lana UI for reports
- Locally we can raise containers with
make dev-up
Milestones
- Starting point: no dagster
- Dagster is deployable in staging and locally
- Dagster gets included in the set of containers that get added to the kubernetes namespace in deployment
- The dagster webserver UI in staging is reachable locally thorugh tunneling
- We can load a dummy code location, but still no code from our own project
- We also include dagster in local
make dev-up
- We can build a lana dagster image with our project code and deploy it
- We build a hello-world grade docker image for dagster code automatically in CI in
lana-bank, which gets added to our container registry - CI bumps this image in our helm charts and gets deployed to staging automatically
- We build a hello-world grade docker image for dagster code automatically in CI in
- Dagster takes care of EL
- We swap the responsibility of doing EL from lana core-pg from Meltano to Dagster
- This will require:
- Setting up EL in dagster
- Adjusting the dbt project's
staginglayer to stop relying on Meltano fields
- While this is in the works, we will need a code freeze in staging
- Dagster takes care of dbt execution
- We swap the responsibility of materializing the dbt DAG from Meltano to Dagster
- While this is in the works, we will need a code freeze in the dbt project
- Dagster can generate file reports
- We integrate
generate-es-reportsin Dagster so that it can generate report files - But we don't plug it into Lana's UI just yet
- At this point we begin a code freeze in
generate-es-reports
- We integrate
- Extract report files API out of Airflow
- We set up an indepent microservice to handle the request and delivering of report files.
- Same behaviour as what airflow flask plugin is doing today
- Internally, interactions with the bucket contents remain the same. The features regarding requesting and monitoring file creation must be repointed from Airflow to Dagster
- At this point we finish the code freeze in
generate-es-reports
- Add E2E testing
- At this stage the whole pipeline is running on dagster. Right time to include tests to automate checking that everything runs
- Cleanup
- Remove any remaining old Meltano/Airflow references, code, env vars, etc. throughout our repositories
Other
How to add dagster to deployment?
Understanding how we add Airflow now
- I'm going to check how Airflow is currently set up.
- From what I understand, the right terraform bits should be spread around
galoy-private-chartsandgaloy-deployments. Let's see what I can find. - Okay, some notes on
galoy-private-chartsand the relationship to `galoy-deployments.:- This repo is a Helm charts factory. We build a chart to deploy a lana-bank instance here.
- The repo defines the chart, then CI tries to check that the chart is deployable with the testflight CI job (
lana-bank-testflight). If testflight succeeds, another CI job (bump-lana-bank-in-deployments) updates the chart automatically ingaloy-deploymentswith a bot commit. galoy-private-charts- Also note that some of the images used in this chart come from upstream deps. Basically, code repos like
lana-bankbuild their own images, which get added into our container registry and then referenced fromgaloy-private-charts. The bumps fromlana-banktogaloy-private-chartshappen through CI automated commits.
How to add dagster
- We surely should rely on the provided helm charts to stick to their recommendations
- We will need to build our own code location container and upload to our container registry. Kind of what we're currently doing with the meltano image
I'm going to install minikube (kubectl and helm are already provided by nix in galoy-private-charts) locally to try to run the helm charts locally. I don't want to have to do full tours through CI and possibly break testflight to add dagster.
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
minikube start --driver=docker
kubectl get nodes
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
I discuss with Kartik on the daily call and he points out to this old Blink guide on how to set up a local kubernetes testing env: https://github.com/blinkbitcoin/charts/blob/main/dev/README.md