galoy-personal-notes/pains.md

# Pains

## Local app/infra
- Starting up the local app and running E2E tests + running EL takes like 20min.
- Having Airflow UI to run things is nice. Having Airflow schedules trigger automatically is painful locally because it makes controling the state of the environment hard.
- Simply checking the E2E flow takes a lot of steps and tools. Feeling that there's always something breaking.

## Meltano

- Meltano logs are terrible to read, make debugging painful.
- Meltano EL takes a long time
- Meltano configuration is painful:
    - Docs are terrible
    - Wrong configurations don't raise errors often times, just get ignored
- Meltano handling python environments is sometimes more of an annoyance than help

## Data Pipeline needs data

- Anemic dataset makes development hard (many entities with few or no records).


## Improvable practices

- Loading all backend tables into DW, regardless of whether they are used
    - More data, worse performance, no gain
    - More cognitive load when working on DW ("What is this table? Where is it used? Can I modify it?")
    - "But then I have all backend tables handy" -> Well, let's make adding a backend table trivial
- dbt
    - Not documenting models
        - Next person has no clue what they are, makes shared ownership hard
    - Not using exposures
        - Hard to know what models impact what reports
        - Hard to know what parts of the DW are truly used
    -

## What are our bottlenecks/issue?

- Dealing with a convoluted output definition (understanding laws + unclear validation procedure with Vicky)
- Translating needed report into SQL transformations of the backend data
- Breaking changes in backend events/entities breaking downstream data dependencies

What is NOT
- Data latency
- Scalability of data volume
- Having a pretty interface


## My proposal

- Fallback to a dramatically simple stack that allows team members working on reports to move fast
    - Hardcoded Bitfinex CSV
    - Hardcoded Sumsub CSV
    - Use another PG as DW, move data with a simple, stateless Python script
- Ignore orchestration, UI delivery, monitoring for now
- Work together with backend to find a convenient solution to have good testing data
- Make an exhaustive list of reports and align with Luis/Vicky on a plan to systematically meet and validate. Track it.

Then, once...
- We have ack from Luis/Vicky that all reports are domain-valid
- We have more clarity on integrations we must run with regulators/government systems after audit
- Backend data model is more stable

... we grab our domain rich, deployment poor setup and we discuss what is the optimal tooling and strategies to deliver with production-grade practices.


## North Star Ideas

- Use an asset based orchestrator, like dagster
- Step away from meltano, use a EL framework such as dlt and combine it with orchestrator
- Add a visualization tool to the stack, such as evidence, metabase, lightdash