84 lines
No EOL
4.6 KiB
Markdown
84 lines
No EOL
4.6 KiB
Markdown
|
|
|
|
## End goals
|
|
|
|
- Dagster is part of deployment, gets created in staging with the full E2E data pipeline running
|
|
- With schedules/automations
|
|
- And is reachable from Lana UI for reports
|
|
- Locally we can raise containers with `make dev-up`
|
|
|
|
|
|
## Milestones
|
|
|
|
- Starting point: no dagster
|
|
- Dagster is deployable in staging and locally
|
|
- Dagster gets included in the set of containers that get added to the kubernetes namespace in deployment
|
|
- The dagster webserver UI in staging is reachable locally thorugh tunneling
|
|
- We can load a dummy code location, but still no code from our own project
|
|
- We also include dagster in local `make dev-up`
|
|
- We can build a lana dagster image with our project code and deploy it
|
|
- We build a hello-world grade docker image for dagster code automatically in CI in `lana-bank`, which gets added to our container registry
|
|
- CI bumps this image in our helm charts and gets deployed to staging automatically
|
|
- Dagster takes care of EL
|
|
- We swap the responsibility of doing EL from lana core-pg from Meltano to Dagster
|
|
- This will require:
|
|
- Setting up EL in dagster
|
|
- Adjusting the dbt project's `staging` layer to stop relying on Meltano fields
|
|
- While this is in the works, we will need a code freeze in staging
|
|
- Dagster takes care of dbt execution
|
|
- We swap the responsibility of materializing the dbt DAG from Meltano to Dagster
|
|
- While this is in the works, we will need a code freeze in the dbt project
|
|
- Dagster can generate file reports
|
|
- We integrate `generate-es-reports` in Dagster so that it can generate report files
|
|
- But we don't plug it into Lana's UI just yet
|
|
- At this point we begin a code freeze in `generate-es-reports`
|
|
- Extract report files API out of Airflow
|
|
- We set up an indepent microservice to handle the request and delivering of report files.
|
|
- Same behaviour as what airflow flask plugin is doing today
|
|
- Internally, interactions with the bucket contents remain the same. The features regarding requesting and monitoring file creation must be repointed from Airflow to Dagster
|
|
- At this point we finish the code freeze in `generate-es-reports`
|
|
- Add E2E testing
|
|
- At this stage the whole pipeline is running on dagster. Right time to include tests to automate checking that everything runs
|
|
- Cleanup
|
|
- Remove any remaining old Meltano/Airflow references, code, env vars, etc. throughout our repositories
|
|
|
|
|
|
## Other
|
|
|
|
### How to add dagster to deployment?
|
|
|
|
|
|
#### Understanding how we add Airflow now
|
|
|
|
- I'm going to check how Airflow is currently set up.
|
|
- From what I understand, the right terraform bits should be spread around `galoy-private-charts` and `galoy-deployments`. Let's see what I can find.
|
|
- Okay, some notes on `galoy-private-charts` and the relationship to `galoy-deployments.:
|
|
- This repo is a Helm charts factory. We build a chart to deploy a lana-bank instance here.
|
|
- The repo defines the chart, then CI tries to check that the chart is deployable with the testflight CI job (`lana-bank-testflight`). If testflight succeeds, another CI job (`bump-lana-bank-in-deployments`) updates the chart automatically in `galoy-deployments` with a bot commit.
|
|
- `galoy-private-charts`
|
|
- Also note that some of the images used in this chart come from upstream deps. Basically, code repos like `lana-bank` build their own images, which get added into our container registry and then referenced from `galoy-private-charts`. The bumps from `lana-bank` to `galoy-private-charts` happen through CI automated commits.
|
|
|
|
|
|
#### How to add dagster
|
|
|
|
- We surely should rely on the provided helm charts to stick to their recommendations
|
|
- https://docs.dagster.io/deployment/oss/deployment-options/kubernetes/deploying-to-kubernetes
|
|
- We will need to build our own code location container and upload to our container registry. Kind of what we're currently doing with the meltano image
|
|
|
|
I'm going to install minikube (kubectl and helm are already provided by nix in `galoy-private-charts`) locally to try to run the helm charts locally. I don't want to have to do full tours through CI and possibly break testflight to add dagster.
|
|
|
|
```
|
|
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
|
|
sudo dpkg -i minikube_latest_amd64.deb
|
|
minikube start --driver=docker
|
|
kubectl get nodes
|
|
```
|
|
|
|
```
|
|
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
|
|
```
|
|
|
|
I discuss with Kartik on the daily call and he points out to this old Blink guide on how to set up a local kubernetes testing env: https://github.com/blinkbitcoin/charts/blob/main/dev/README.md
|
|
|
|
|
|
### How to add dagster to make dev-up? |