sh-notion/notion_data_team_no_files/Dagster hello-world 723420fec494478b9c89d308b0f213a7.md
Pablo Martin a256b48b01 pages
2025-07-11 16:15:17 +02:00

6.8 KiB
Raw Permalink Blame History

Dagster hello-world

First shot following the quickstart

Im going to begin by following this: https://docs.dagster.io/getting-started/quickstart

Even though theres also this guide: https://docs.dagster.io/guides/running-dagster-locally

  • Ive made a new directory dedicated to this called dagster-hello-world.

  • Ill try to use poetry, so I start with a good old poetry init to get the project started. Just accepted all defaults, no fancy configs.

  • In there, Im running a poetry add dagster dagster-webserver just like that, with no venv or anything. Straight from the global Python runtime.

  • Got these errors back

    Creating virtualenv dagster-hello-world-bcH0h4Hq-py3.10 in /home/pablo/.cache/pypoetry/virtualenvs
    Using version ^1.8.12 for dagster
    Using version ^1.8.12 for dagster-webserver
    
    Updating dependencies
    Resolving dependencies... (0.2s)
    
    The current project's supported Python range (>=3.10,<4.0) is not compatible with some of the required packages Python requirement:
      - dagster-webserver requires Python <3.13,>=3.8, so it will not be satisfied for Python >=3.13,<4.0
    
    Because no versions of dagster-webserver match >1.8.12,<2.0.0
     and dagster-webserver (1.8.12) requires Python <3.13,>=3.8, dagster-webserver is forbidden.
    So, because dagster-hello-world depends on dagster-webserver (^1.8.12), version solving failed.
    
      • Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties
    
        For dagster-webserver, a possible solution would be to set the `python` property to ">=3.10,<3.13"
    
        https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
        https://python-poetry.org/docs/dependency-specification/#using-environment-markers
    
  • After fucking around for a bit, I got it working by changing the pyproject.toml file. The change was this line:

    [tool.poetry.dependencies]
    python = "^3.10"
    

    to this (^ operator removed)

    [tool.poetry.dependencies]
    python = "3.10"
    
  • Now I tried to run dagster dev . Not working at all.

Im starting from scratch, that was weird and nothing works. I started here: https://docs.dagster.io/getting-started/install#installing-dagster-using-poetry

Now Im going to start from here instead: https://docs.dagster.io/getting-started/quickstart

  • First, I run this

    git clone https://github.com/dagster-io/dagster-quickstart && cd dagster-quickstart
    
  • Then (it wasnt mentioned in the guide) I create a python venv with python3 -m venv venv

  • Then I activate the venv and run pip install -e ".[dev]"

  • Then dagster dev... and the UI appears

  • I run the pipelines and materialize the examples as the quickstart indicates. Its roughly clear.

  • I now understand that an asset in Dagster lingo is pretty much declaring:

    • That a DAG node exists and has certain features
    • The code that materializes it
  • Im going to need more practice to visualize more clearly how would we use this.

dbt example

The docs from dagster have a guide on how to integrate with a dbt project. Ill try to do that with our project.

Link: https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster

  • I begin by making a copy of our dbt git repo as-is

  • Then I do some pip installs on my main Python interpreter:

    pip install dagster-dbt dagster-webserver
    
  • Then I run

    dagster-dbt project scaffold --project-name sh_dagster_dbt --dbt-project-dir ./dbt-dagster-playground/
    
  • Ok, apparently this has created an entire new Dagster project at home/pablo/sh_dagster_dbt. This new project has a hardcoded filepath reference to the dbt project folder. I must say this is shaky as hell, but well, at least its transparent and visible.

  • Now, from the root of the dagster project folder, I run dagster dev to get the UI started and I see the whole dbt project displayed. Pretty neat. All dependencies are there, and dagster even parses the documentation and displays it.

  • I have a very tempting Materialize All button, which I click like a kid not knowing what will happen.

  • Stuff blew up everywhere, apparently complaining on the database not being reachable. The Materialize All button tried to run both the models and their tests. That means it was trying to run with the default profile (dwh_hybrid). Reasonable, but I would expect some way to pick the profile in dagster. Cant find it anywhere.

  • I started my local postgres and tried to materialize some staging models. It works! I can now travel through the assets graph and select any subset of models and run them. The UI shows the last time the model was materialized and logs on the run.

  • I also found some hidden menu where the config of the run can be modified, including picking whats the profile that should be used. I think Im getting the pattern here:

    • Run templates can be defined through the UI by drag and drop…
    • … but for the stable stuff, what you want to do is to define it in Python files
  • Anyways, before trying to build a “pipeline” in the old sense, I want to see if I can also add airbyte here in the picture. If I can, then I can try to map dependencies and make a joint pipeline across airbyte and dbt.

  • I have an old locally deployed airbyte compose, so Ill just use that to mess around.

  • Ive had to pip install dagster-airbyte

  • Ive created a file in sh_dagster_dbt/airbyte.py and added this bit:

    from dagster_airbyte import AirbyteResource
    
    airbyte_instance = AirbyteResource(
        host="localhost",
        port="8000",
        # If using basic auth, include username and password:
        username="airbyte",
        password="airbyte",
    )
    
  • Through the Airbyte UI, Ive created a source pointing to our Xero production instance and made a connection with my local DWH.