Data architecture ramblings

Some notes and thoughts after my first week.

Why a DW?

Why do we need a DWH?
You could make a case that generating the file-based reports through the app could be solved simply by a queue + job system, which probably would fit easily within the current capabilities of lana. Maybe
Are there more requirements that I'm not contemplating here?
I'm guessing that reporting in general will always be needed, but I'm concerned about whether we should support deployment for that. Our ICP will typically already have a DW of its own. Although a batteries included approach may be appreciated in some situations, I would guess almost always the ICP will simply expect to be able to ingest the lana data into its own DW.
Perhaps volcano is a special case because they're starting out, but should we then make sure we tell apart what is lana, volcano-agnostic, and specific additional stuff we build for volcano.

Why BQ? Why snowflake?
Does it need to be multiple engines? Is it worth it? Can we do it? What are the risks of sticking to only one?
Why not another postgres instance in the lana deployment?

Should the database stay public to the data downstream dependants?
Or do we want to introduce an additional layer for decoupling and stable interface?

Do we want to bring a batteries included approach to visualizing data about the bank? Should that be a responsibility of the app UI, or of an additional reporting solution? Or embbeding reporting within the app?

Add another postgres instance to the deployment service
Use that as DW
Do pg2pg EL and transform there
Make deployment, testing, etc. much easier
If we encounter multiengine in a rush, approach model building with "write in psql, transpile to XYZ". Potentially, add testing
About what's only volcano-what's lana in general
- Either have one dbt project and use folders/tagging
- Or do multiple projects
- Or just do a monster now and we will slice it in the future as needed
Optionally, add visualization in the stack