# Data architecture ramblings Some notes and thoughts after my first week. ## Why a DW? - Why do we need a DWH? - You could make a case that generating the file-based reports through the app could be solved simply by a queue + job system, which probably would fit easily within the current capabilities of `lana`. Maybe - Are there more requirements that I'm not contemplating here? - I'm guessing that reporting in general will always be needed, but I'm concerned about whether we should support deployment for that. Our ICP will typically already have a DW of its own. Although a batteries included approach may be appreciated in some situations, I would guess almost always the ICP will simply expect to be able to ingest the lana data into its own DW. - Perhaps volcano is a special case because they're starting out, but should we then make sure we tell apart what is `lana`, volcano-agnostic, and specific additional stuff we build for volcano. ## SQL Engine choice - Why BQ? Why snowflake? - Does it need to be multiple engines? Is it worth it? Can we do it? What are the risks of sticking to only one? - Why not another postgres instance in the `lana` deployment? ## Coupling with database - Should the database stay public to the data downstream dependants? - Or do we want to introduce an additional layer for decoupling and stable interface? ## Visualization layer - Do we want to bring a batteries included approach to visualizing data about the bank? Should that be a responsibility of the app UI, or of an additional reporting solution? Or embbeding reporting within the app? ## Other stuff - data contracts - dbt unit testing - data integration testing - more solid development practices in data? ## If you asked meTM - Add another postgres instance to the deployment service - Use that as DW - Do pg2pg EL and transform there - Make deployment, testing, etc. much easier - If we encounter multiengine in a rush, approach model building with "write in psql, transpile to XYZ". Potentially, add testing - About what's only volcano-what's lana in general - Either have one dbt project and use folders/tagging - Or do multiple projects - Or just do a monster now and we will slice it in the future as needed - Optionally, add visualization in the stack