data arc ramblings
This commit is contained in:
parent
465d9007a0
commit
170e50c30c
1 changed files with 42 additions and 0 deletions
42
data-arch.md
Normal file
42
data-arch.md
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
# Data architecture ramblings
|
||||
|
||||
Some notes and thoughts after my first week.
|
||||
|
||||
## Why a DW?
|
||||
|
||||
- Why do we need a DWH?
|
||||
- You could make a case that generating the file-based reports through the app could be solved simply by a queue + job system, which probably would fit easily within the current capabilities of `lana`. Maybe
|
||||
- Are there more requirements that I'm not contemplating here?
|
||||
- I'm guessing that reporting in general will always be needed, but I'm concerned about whether we should support deployment for that. Our ICP will typically already have a DW of its own. Although a batteries included approach may be appreciated in some situations, I would guess almost always the ICP will simply expect to be able to ingest the lana data into its own DW.
|
||||
- Perhaps volcano is a special case because they're starting out, but should we then make sure we tell apart what is `lana`, volcano-agnostic, and specific additional stuff we build for volcano.
|
||||
|
||||
## SQL Engine choice
|
||||
|
||||
- Why BQ? Why snowflake?
|
||||
- Does it need to be multiple engines? Is it worth it? Can we do it? What are the risks of sticking to only one?
|
||||
- Why not another postgres instance in the `lana` deployment?
|
||||
|
||||
## Visualization layer
|
||||
|
||||
- Do we want to bring a batteries included approach to visualizing data about the bank? Should that be a responsibility of the app UI, or of an additional reporting solution? Or embbeding reporting within the app?
|
||||
|
||||
## Other stuff
|
||||
|
||||
- data contracts
|
||||
- dbt unit testing
|
||||
- data integration testing
|
||||
- more solid development data?
|
||||
|
||||
|
||||
## If you asked meTM
|
||||
|
||||
- Add another postgres instance to the deployment service
|
||||
- Use that as DW
|
||||
- Do pg2pg EL and transform there
|
||||
- Make deployment, testing, etc. much easier
|
||||
- If we encounter multiengine in a rush, approach model building with "write in psql, transpile to XYZ". Potentially, add testing
|
||||
- About what's only volcano-what's lana in general
|
||||
- Either have one dbt project and use folders/tagging
|
||||
- Or do multiple projects
|
||||
- Or just do a monster now and we will slice it in the future as needed
|
||||
- Optionally, add visualization in the stack
|
||||
Loading…
Add table
Add a link
Reference in a new issue