galoy-personal-notes/data-arch.md
2025-07-29 18:26:20 +02:00

2.2 KiB

Data architecture ramblings

Some notes and thoughts after my first week.

Why a DW?

  • Why do we need a DWH?
  • You could make a case that generating the file-based reports through the app could be solved simply by a queue + job system, which probably would fit easily within the current capabilities of lana. Maybe
  • Are there more requirements that I'm not contemplating here?
  • I'm guessing that reporting in general will always be needed, but I'm concerned about whether we should support deployment for that. Our ICP will typically already have a DW of its own. Although a batteries included approach may be appreciated in some situations, I would guess almost always the ICP will simply expect to be able to ingest the lana data into its own DW.
  • Perhaps volcano is a special case because they're starting out, but should we then make sure we tell apart what is lana, volcano-agnostic, and specific additional stuff we build for volcano.

SQL Engine choice

  • Why BQ? Why snowflake?
  • Does it need to be multiple engines? Is it worth it? Can we do it? What are the risks of sticking to only one?
  • Why not another postgres instance in the lana deployment?

Coupling with database

  • Should the database stay public to the data downstream dependants?
  • Or do we want to introduce an additional layer for decoupling and stable interface?

Visualization layer

  • Do we want to bring a batteries included approach to visualizing data about the bank? Should that be a responsibility of the app UI, or of an additional reporting solution? Or embbeding reporting within the app?

Other stuff

  • data contracts
  • dbt unit testing
  • data integration testing
  • more solid development practices in data?

If you asked meTM

  • Add another postgres instance to the deployment service
  • Use that as DW
  • Do pg2pg EL and transform there
  • Make deployment, testing, etc. much easier
  • If we encounter multiengine in a rush, approach model building with "write in psql, transpile to XYZ". Potentially, add testing
  • About what's only volcano-what's lana in general
    • Either have one dbt project and use folders/tagging
    • Or do multiple projects
    • Or just do a monster now and we will slice it in the future as needed
  • Optionally, add visualization in the stack