stuff

2025-07-29 18:26:20 +02:00 · 2025-07-29 18:26:20 +02:00 · 97d27ac539
commit 97d27ac539
parent 170e50c30c
2 changed files with 180 additions and 2 deletions
--- a/log.md
+++ b/log.md
@ -1,5 +1,179 @@
 # Log

+
+## 2025-07-29
+
+### Summary for ACE tomorrow
+
+While onboarding, I've inevitably looked at the data pipeline with fresh eyes, and it has made me observe some parts of the stack feel strange for the requirements we must satisfy with Lana.
+
+This doc briefly lays out what parts of the stack I suggest we rethink.
+
+## Brief review of current situation 
+
+### Needs we are covering
+
+Currently, we serve one very clear requirement with the data pipeline: building and delivering the UI generated report files.
+
+Potentially, we could also think about having a "batteries" included BI attitude with Lana, where Lana gets bundled with an ingest+transform+present stack that allows a deployment to see some useful business reports on how the bank is doing. This is not a strict requirement for Volcano but rather a nice-to-have feature addition that is completely up to use. It's worth discussing if this makes sense depending on the chances that most of our customers would already have a Datawarehouse or wider data platform already running.
+
+### How we are doing it
+
+We:
+- Ingest from Meltano into BigQuery
+- We run SQL defined transformations to data in BigQuery with dbt.
+- We generate report files via Python scripts, store them in GCS buckets and make them accessible through the UI.
+- We orchestrate the whole thing with a mix of scheduled running and UI triggered actions.
+
+## Parts that feel great, parts that feel odd
+
+Great:
+- Meltano as a choice to do the Extract&Load between the app db and any DW.
+- dbt as a choice for managing the tangle of SQL defined transformations
+- To not try to do this transformations in any way within the rust codebase.
+
+Odd:
+- BigQuery choice
+    - We are coupled to BigQuery, a GCP specific tool that is not viable for Volcano, and would neither be for many other potential clients out there who are not a GCP shop.
+    - BigQuery feels overkill for the data volume one would expect in greenfield banks, unless they have extremely ambitious growth targets.
+
+## A simpler alternative
+
+- Raise another postgres instance within the deployments. Use it to replace what BigQuery is being used for right now.
+  - Tiny side note: perhaps we could leverage the same postgres instance we have for meltano and airflow data, just adding a new database there to act as DW. There are pros and cons to it.
+
+## Another alternative
+
+- Swap Bigquery with Snowflake
+
+## Comparison
+
+Going for postgres:
+- Pros
+    - Familiar technology
+    - We can include easily in CI jobs, tests, etc. "More control"
+    - Probably the most popular db out there, hence pretty much all tooling has available connectors and integrations to it.
+    - Opens up the possibility to do Extract and Load (EL) between app and DW with many of the available Postgres replication options and tools. Could be interesting if we ever face extremely low latency (under 1min) needs.
+- Cons
+    - We need to move from BQ to Postgres. Change all SQL, configs, deployment, etc.
+    - It's not strictly built for DW needs. Will become challenging if some deployment scales a lot in data volume.
+    - May not look flashy to corp IT mgmt who expect all the fancy tools.
+
+Going for Snowflake:
+- Pros
+    - Allows us to deliver highly scalable DW (the big question here is: will we ever need to do that?).
+    - Comes with a lot of additional tooling and goodies around the DW.
+    - Makes us look serious (old fashioned IT mgmt at large corps probably feels more comfortable hearing that we use Snowflake than Postgres, even if it might be pointless or detrimental)
+- Cons
+    - We need to move from BQ to Snowflake. Change all SQL, configs, deployment, etc.
+    - We need to plug ourselves into it for all CI and local env work. Might be more convoluted than simply raising a Postgres container.
+    - We are coupling our data stack with one vendor. It's the top dog of its niche.
+    - Although it's very popular in the data/BI niche, is far from being as popular as Postgres. Connectors and integrations with other tools might not always exist.
+    - EL jobs from the app will always need to be driven by external tools like Meltano. Snowflake states that they can do CDC replication (https://docs.snowflake.com/en/connectors/postgres6/configure-replication), but we would need to verify if that is as performant as Postgres to Postgres replication.
+
+
+
+### Chat with Jose
+
+- Report generation through the UI
+  - Is there any feature/value/thingie that we are delivering via the data stack other than the UI generated reports?
+
+- Reporting
+    - Could you give me some guidance on what could I pick
+
+### More chat with Justin
+
+- How the different containers talk to each other
+- Airflow integration 
+    - Reports are trigered from UI.
+    - One button to generate all the reports
+    - graphql mutation triggerReportRun
+    - trigger report run
+    - user must have permission
+    - job 
+    - airflow create a job
+    - the reports batch as a unique id
+    - then airflow gets called 
+    - report api plugin flask
+    - reports get dropped into buckets
+    - we can query report by 
+
+- meltano also defines a scheduled generation
+  - are the scheduled generations visible from the UI? Yes
+  - 
+
+Three pg
+- core -> run time 
+    - all application data
+    - modular monolith
+    - clear partitionings, no dependency rules
+    - the most important database
+- kratos
+    - IAM
+    - we probably expect the customer to bring it's own 
+    - core application has no auth 
+    - we probably 
+
+- data (meltano + airflow)
+  - airflow
+
+
+## 2025-07-28
+
+### Catch-up with Justin
+
+- Still would appreciate
+    - An architecture overview of the app itself
+    - Someone running me through the app as if I was 
+- My next goals
+    - E2E data run with Sebastien
+    - Sit down with Jose like "point me to something more specific that I can do"
+    - Drive architecture conversation
+
+
+- Bigquery was the right call for the Blink DWH
+
+- Set up a session with Sebastien and Jose
+- 
+
+UI run through
+
+
+
+Config
+
+Chart of accounts (Cala Stuff)
+- ledger_account_id: CalaAccountId
+- AccountSet (Rollup of multiple accounts) NOT THE SAME THING AS LANA Chart of Accounts
+
+Deposit module depends on DepositLedger, which templates and writes into Cala ledger
+
+Omnibus 
+
+
+### Chat with Vaibhav
+
+- Taking course in Spanish
+- 2.5 years
+    - Started as an intern
+    - Summer of Bitcoin
+
+
+
+## 2025-07-25
+
+### Chat with Jose
+
+- Intro
+    - 4 years with Galoy
+    - Originally from Bolivia, but grew in the US
+- Who is abussutil, reach out cold?
+    - Alexandre Bussutil
+- Any conversations you think I should jump into?
+
+- 6 or 7 PDFs that has all the questions
+- start from the end and pull from there
+
 ## 2025-07-24

 ### All hands