125 lines
6.8 KiB
Markdown
125 lines
6.8 KiB
Markdown
|
|
# Dagster hello-world
|
|||
|
|
|
|||
|
|
## First shot following the quickstart
|
|||
|
|
|
|||
|
|
I’m going to begin by following this: [https://docs.dagster.io/getting-started/quickstart](https://docs.dagster.io/getting-started/quickstart)
|
|||
|
|
|
|||
|
|
Even though there’s also this guide: [https://docs.dagster.io/guides/running-dagster-locally](https://docs.dagster.io/guides/running-dagster-locally)
|
|||
|
|
|
|||
|
|
- I’ve made a new directory dedicated to this called `dagster-hello-world`.
|
|||
|
|
- I’ll try to use `poetry`, so I start with a good old `poetry init` to get the project started. Just accepted all defaults, no fancy configs.
|
|||
|
|
- In there, I’m running a `poetry add dagster dagster-webserver` just like that, with no `venv` or anything. Straight from the global Python runtime.
|
|||
|
|
- Got these errors back
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Creating virtualenv dagster-hello-world-bcH0h4Hq-py3.10 in /home/pablo/.cache/pypoetry/virtualenvs
|
|||
|
|
Using version ^1.8.12 for dagster
|
|||
|
|
Using version ^1.8.12 for dagster-webserver
|
|||
|
|
|
|||
|
|
Updating dependencies
|
|||
|
|
Resolving dependencies... (0.2s)
|
|||
|
|
|
|||
|
|
The current project's supported Python range (>=3.10,<4.0) is not compatible with some of the required packages Python requirement:
|
|||
|
|
- dagster-webserver requires Python <3.13,>=3.8, so it will not be satisfied for Python >=3.13,<4.0
|
|||
|
|
|
|||
|
|
Because no versions of dagster-webserver match >1.8.12,<2.0.0
|
|||
|
|
and dagster-webserver (1.8.12) requires Python <3.13,>=3.8, dagster-webserver is forbidden.
|
|||
|
|
So, because dagster-hello-world depends on dagster-webserver (^1.8.12), version solving failed.
|
|||
|
|
|
|||
|
|
• Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties
|
|||
|
|
|
|||
|
|
For dagster-webserver, a possible solution would be to set the `python` property to ">=3.10,<3.13"
|
|||
|
|
|
|||
|
|
https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
|
|||
|
|
https://python-poetry.org/docs/dependency-specification/#using-environment-markers
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- After fucking around for a bit, I got it working by changing the `pyproject.toml` file. The change was this line:
|
|||
|
|
|
|||
|
|
```toml
|
|||
|
|
[tool.poetry.dependencies]
|
|||
|
|
python = "^3.10"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
to this (`^` operator removed)
|
|||
|
|
|
|||
|
|
```toml
|
|||
|
|
[tool.poetry.dependencies]
|
|||
|
|
python = "3.10"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Now I tried to run `dagster dev` . Not working at all.
|
|||
|
|
|
|||
|
|
I’m starting from scratch, that was weird and nothing works. I started here: [https://docs.dagster.io/getting-started/install#installing-dagster-using-poetry](https://docs.dagster.io/getting-started/install#installing-dagster-using-poetry)
|
|||
|
|
|
|||
|
|
Now I’m going to start from here instead: [https://docs.dagster.io/getting-started/quickstart](https://docs.dagster.io/getting-started/quickstart)
|
|||
|
|
|
|||
|
|
- First, I run this
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git clone https://github.com/dagster-io/dagster-quickstart && cd dagster-quickstart
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Then (it wasn’t mentioned in the guide) I create a python `venv` with `python3 -m venv venv`
|
|||
|
|
- Then I activate the `venv` and run `pip install -e ".[dev]"`
|
|||
|
|
- Then `dagster dev`... and the UI appears
|
|||
|
|
- I run the pipelines and materialize the examples as the quickstart indicates. It’s roughly clear.
|
|||
|
|
- I now understand that an `asset` in Dagster lingo is pretty much declaring:
|
|||
|
|
- That a DAG node exists and has certain features
|
|||
|
|
- The code that materializes it
|
|||
|
|
- I’m going to need more practice to visualize more clearly how would we use this.
|
|||
|
|
|
|||
|
|
## dbt example
|
|||
|
|
|
|||
|
|
The docs from dagster have a guide on how to integrate with a dbt project. I’ll try to do that with our project.
|
|||
|
|
|
|||
|
|
Link: https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster
|
|||
|
|
|
|||
|
|
- I begin by making a copy of our dbt git repo as-is
|
|||
|
|
- Then I do some pip installs on my main Python interpreter:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
pip install dagster-dbt dagster-webserver
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Then I run
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
dagster-dbt project scaffold --project-name sh_dagster_dbt --dbt-project-dir ./dbt-dagster-playground/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Ok, apparently this has created an entire new Dagster project at `home/pablo/sh_dagster_dbt`. This new project has a hardcoded filepath reference to the dbt project folder. I must say this is shaky as hell, but well, at least it’s transparent and visible.
|
|||
|
|
- Now, from the root of the dagster project folder, I run `dagster dev` to get the UI started and I see the whole dbt project displayed. Pretty neat. All dependencies are there, and dagster even parses the documentation and displays it.
|
|||
|
|
- I have a very tempting `Materialize All` button, which I click like a kid not knowing what will happen.
|
|||
|
|
- Stuff blew up everywhere, apparently complaining on the database not being reachable. The `Materialize All` button tried to run both the models and their tests. That means it was trying to run with the default profile (`dwh_hybrid`). Reasonable, but I would expect some way to pick the profile in dagster. Can’t find it anywhere.
|
|||
|
|
- I started my local postgres and tried to materialize some staging models. It works! I can now travel through the assets graph and select any subset of models and run them. The UI shows the last time the model was materialized and logs on the run.
|
|||
|
|
- I also found some hidden menu where the config of the run can be modified, including picking what’s the profile that should be used. I think I’m getting the pattern here:
|
|||
|
|
- Run templates can be defined through the UI by drag and drop…
|
|||
|
|
- … but for the stable stuff, what you want to do is to define it in Python files
|
|||
|
|
- Anyways, before trying to build a “pipeline” in the old sense, I want to see if I can also add airbyte here in the picture. If I can, then I can try to map dependencies and make a joint pipeline across airbyte and dbt.
|
|||
|
|
- I have an old locally deployed airbyte compose, so I’ll just use that to mess around.
|
|||
|
|
- I’ve had to `pip install dagster-airbyte`
|
|||
|
|
- I’ve created a file in `sh_dagster_dbt/airbyte.py` and added this bit:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from dagster_airbyte import AirbyteResource
|
|||
|
|
|
|||
|
|
airbyte_instance = AirbyteResource(
|
|||
|
|
host="localhost",
|
|||
|
|
port="8000",
|
|||
|
|
# If using basic auth, include username and password:
|
|||
|
|
username="airbyte",
|
|||
|
|
password="airbyte",
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Through the Airbyte UI, I’ve created a source pointing to our Xero production instance and made a connection with my local DWH.
|
|||
|
|
|
|||
|
|
## Links
|
|||
|
|
|
|||
|
|
- Docs: [https://docs.dagster.io/getting-started](https://docs.dagster.io/getting-started)
|
|||
|
|
- MDS template: [https://github.com/dagster-io/dagster/tree/master/examples/assets_modern_data_stack](https://github.com/dagster-io/dagster/tree/master/examples/assets_modern_data_stack)
|
|||
|
|
- General repo: [https://github.com/dagster-io/dagster/tree/master](https://github.com/dagster-io/dagster/tree/master)
|
|||
|
|
- dbt + dagster tutorial: https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster
|
|||
|
|
- airbyte + dagster tutorial: https://docs.dagster.io/integrations/airbyte
|
|||
|
|
-
|