# Choosing

## Intro

First step is to choose which one we want to go for.

The go-to names in the industry are Airflow, Prefect and Dagster. After a lot of unstructured research during the past year, I’ve decided to simply narrow it down to Prefect and Dagster. Both options seem more feature rich, have better integrations with our stack, and have less people complaining about them than Airflow. Airflow is what everyone uses and everyone complains about, so we might just as well directly dodge the bullet.

Between Prefect and Dagster: I can’t pick one yet. I worked a lot with prefect and I know it’s good, but that was working on Prefect 1, and they are already on version 3, so things might have changed a lot. 

On the other hand, I haven’t tried Dagster, but I’ve heard lovely things about it. Apparently, it’s data asset abstraction makes pipelines and governance incredibly better. Plus, it has very nice integrations with Airbyte and dbt, way better than what I’ve seen in other tools. 

To be able to choose between the two, I’ve decided to run a little bit of a hello-world exercise with both of them. The plan is to do the same stuff on both, document it, discuss with Uri, and then make a decision. Once that’s done, we start planning how do we do the production deployment and how we move over executions to there.

## Orchestration hello-world

These are the steps I would like to run with both.

- Try to deploy it locally
- Deploy a local Airbyte and a local DWH alongside
- Try to setup a full Xero pipeline
    - This means, setting up an Airbyte connection and running locally the dbt pipeline, for the Xero tables (`dbt run -s models/staging/xero+`)
    - The pipeline should run everything: airbyte, and all the layers of dbt stuff.
    - also, run related dbt tests
- Try to setup a full `xexe` pipeline
    - This means, triggering runs made with the CLI interface of `xexe` or by importing it as a library, and then running locally the dbt pipeline for the downstream currency related tables (not including the gazillion DWH tables that depend on them. Just currency stuff down to `int_simple_exchange_rates`).
    - also, run related dbt tests

Besides that, I might also try to:

- Send messages through slack for alerts
- Deploy on Azure (not the final, production deployment by any means)

Some areas where I would like to take thorough notes on the features:

- Retry logic
- Pipeline logs
- dbt logs
- Warnings and alerts, perhaps even incident management
- Scalability features with parametrization
- Secret management
- Pipeline version control
- Triggering and scheduling capabilities
- API for external services to interact
- ownership and governance of pipelines
- how the hell can we play it smart with backfills
- Development and deployment flow

[Dagster hello-world](Dagster%20hello-world%20723420fec494478b9c89d308b0f213a7.md)