data arc ramblings

2025-07-25 14:22:10 +02:00 · 2025-07-25 14:22:10 +02:00 · 170e50c30c
commit 170e50c30c
parent 465d9007a0
1 changed files with 42 additions and 0 deletions
--- a/data-arch.md
+++ b/data-arch.md
@ -0,0 +1,42 @@
+# Data architecture ramblings
+
+Some notes and thoughts after my first week.
+
+## Why a DW?
+
+- Why do we need a DWH?
+- You could make a case that generating the file-based reports through the app could be solved simply by a queue + job system, which probably would fit easily  within the current capabilities of `lana`. Maybe 
+- Are there more requirements that I'm not contemplating here?
+- I'm guessing that reporting in general will always be needed, but I'm concerned about whether we should support deployment for that. Our ICP will typically already have a DW of its own. Although a batteries included approach may be appreciated in some situations, I would guess almost always the ICP will simply expect to be able to ingest the lana data into its own DW.
+- Perhaps volcano is a special case because they're starting out, but should we then make sure we tell apart what is `lana`, volcano-agnostic, and specific additional stuff we build for volcano.
+
+## SQL Engine choice
+
+- Why BQ? Why snowflake?
+- Does it need to be multiple engines? Is it worth it? Can we do it? What are the risks of sticking to only one?
+- Why not another postgres instance in the `lana` deployment?
+
+## Visualization layer
+
+- Do we want to bring a batteries included approach to visualizing data about the bank? Should that be a responsibility of the app UI, or of an additional reporting solution? Or embbeding reporting within the app?
+
+## Other stuff
+
+- data contracts
+- dbt unit testing
+- data integration testing
+- more solid development data?
+
+
+## If you asked meTM
+
+- Add another postgres instance to the deployment service
+- Use that as DW
+- Do pg2pg EL and transform there
+- Make deployment, testing, etc. much easier
+- If we encounter multiengine in a rush, approach model building with "write in psql, transpile to XYZ". Potentially, add testing 
+- About what's only volcano-what's lana in general
+    - Either have one dbt project and use folders/tagging
+    - Or do multiple projects
+    - Or just do a monster now and we will slice it in the future as needed
+- Optionally, add visualization in the stack