sh-notion/notion_data_team_no_files/Dagster hello-world 723420fec494478b9c89d308b0f213a7.md
Pablo Martin a256b48b01 pages
2025-07-11 16:15:17 +02:00

125 lines
No EOL
6.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dagster hello-world
## First shot following the quickstart
Im going to begin by following this: [https://docs.dagster.io/getting-started/quickstart](https://docs.dagster.io/getting-started/quickstart)
Even though theres also this guide: [https://docs.dagster.io/guides/running-dagster-locally](https://docs.dagster.io/guides/running-dagster-locally)
- Ive made a new directory dedicated to this called `dagster-hello-world`.
- Ill try to use `poetry`, so I start with a good old `poetry init` to get the project started. Just accepted all defaults, no fancy configs.
- In there, Im running a `poetry add dagster dagster-webserver` just like that, with no `venv` or anything. Straight from the global Python runtime.
- Got these errors back
```
Creating virtualenv dagster-hello-world-bcH0h4Hq-py3.10 in /home/pablo/.cache/pypoetry/virtualenvs
Using version ^1.8.12 for dagster
Using version ^1.8.12 for dagster-webserver
Updating dependencies
Resolving dependencies... (0.2s)
The current project's supported Python range (>=3.10,<4.0) is not compatible with some of the required packages Python requirement:
- dagster-webserver requires Python <3.13,>=3.8, so it will not be satisfied for Python >=3.13,<4.0
Because no versions of dagster-webserver match >1.8.12,<2.0.0
and dagster-webserver (1.8.12) requires Python <3.13,>=3.8, dagster-webserver is forbidden.
So, because dagster-hello-world depends on dagster-webserver (^1.8.12), version solving failed.
• Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties
For dagster-webserver, a possible solution would be to set the `python` property to ">=3.10,<3.13"
https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
https://python-poetry.org/docs/dependency-specification/#using-environment-markers
```
- After fucking around for a bit, I got it working by changing the `pyproject.toml` file. The change was this line:
```toml
[tool.poetry.dependencies]
python = "^3.10"
```
to this (`^` operator removed)
```toml
[tool.poetry.dependencies]
python = "3.10"
```
- Now I tried to run `dagster dev` . Not working at all.
Im starting from scratch, that was weird and nothing works. I started here: [https://docs.dagster.io/getting-started/install#installing-dagster-using-poetry](https://docs.dagster.io/getting-started/install#installing-dagster-using-poetry)
Now Im going to start from here instead: [https://docs.dagster.io/getting-started/quickstart](https://docs.dagster.io/getting-started/quickstart)
- First, I run this
```bash
git clone https://github.com/dagster-io/dagster-quickstart && cd dagster-quickstart
```
- Then (it wasnt mentioned in the guide) I create a python `venv` with `python3 -m venv venv`
- Then I activate the `venv` and run `pip install -e ".[dev]"`
- Then `dagster dev`... and the UI appears
- I run the pipelines and materialize the examples as the quickstart indicates. Its roughly clear.
- I now understand that an `asset` in Dagster lingo is pretty much declaring:
- That a DAG node exists and has certain features
- The code that materializes it
- Im going to need more practice to visualize more clearly how would we use this.
## dbt example
The docs from dagster have a guide on how to integrate with a dbt project. Ill try to do that with our project.
Link: https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster
- I begin by making a copy of our dbt git repo as-is
- Then I do some pip installs on my main Python interpreter:
```bash
pip install dagster-dbt dagster-webserver
```
- Then I run
```bash
dagster-dbt project scaffold --project-name sh_dagster_dbt --dbt-project-dir ./dbt-dagster-playground/
```
- Ok, apparently this has created an entire new Dagster project at `home/pablo/sh_dagster_dbt`. This new project has a hardcoded filepath reference to the dbt project folder. I must say this is shaky as hell, but well, at least its transparent and visible.
- Now, from the root of the dagster project folder, I run `dagster dev` to get the UI started and I see the whole dbt project displayed. Pretty neat. All dependencies are there, and dagster even parses the documentation and displays it.
- I have a very tempting `Materialize All` button, which I click like a kid not knowing what will happen.
- Stuff blew up everywhere, apparently complaining on the database not being reachable. The `Materialize All` button tried to run both the models and their tests. That means it was trying to run with the default profile (`dwh_hybrid`). Reasonable, but I would expect some way to pick the profile in dagster. Cant find it anywhere.
- I started my local postgres and tried to materialize some staging models. It works! I can now travel through the assets graph and select any subset of models and run them. The UI shows the last time the model was materialized and logs on the run.
- I also found some hidden menu where the config of the run can be modified, including picking whats the profile that should be used. I think Im getting the pattern here:
- Run templates can be defined through the UI by drag and drop…
- … but for the stable stuff, what you want to do is to define it in Python files
- Anyways, before trying to build a “pipeline” in the old sense, I want to see if I can also add airbyte here in the picture. If I can, then I can try to map dependencies and make a joint pipeline across airbyte and dbt.
- I have an old locally deployed airbyte compose, so Ill just use that to mess around.
- Ive had to `pip install dagster-airbyte`
- Ive created a file in `sh_dagster_dbt/airbyte.py` and added this bit:
```python
from dagster_airbyte import AirbyteResource
airbyte_instance = AirbyteResource(
host="localhost",
port="8000",
# If using basic auth, include username and password:
username="airbyte",
password="airbyte",
)
```
- Through the Airbyte UI, Ive created a source pointing to our Xero production instance and made a connection with my local DWH.
## Links
- Docs: [https://docs.dagster.io/getting-started](https://docs.dagster.io/getting-started)
- MDS template: [https://github.com/dagster-io/dagster/tree/master/examples/assets_modern_data_stack](https://github.com/dagster-io/dagster/tree/master/examples/assets_modern_data_stack)
- General repo: [https://github.com/dagster-io/dagster/tree/master](https://github.com/dagster-io/dagster/tree/master)
- dbt + dagster tutorial: https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster
- airbyte + dagster tutorial: https://docs.dagster.io/integrations/airbyte
-