From 0e7af362700efe8d6484d8c3d820d006d7a8c651 Mon Sep 17 00:00:00 2001 From: pablo Date: Wed, 30 Jul 2025 16:22:03 +0200 Subject: [PATCH] stuff --- log.md | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/log.md b/log.md index f5c54c4..9570045 100644 --- a/log.md +++ b/log.md @@ -1,6 +1,16 @@ # Log +## 2025-07-30 + +### Chat with Andrej + +- traveling around central america after covid. +- freelance +- used to work in a marketing agency +- working for galoy since 2022 +- Located in Berlin in El Salvador + ## 2025-07-29 ### Summary for ACE tomorrow @@ -36,17 +46,23 @@ Odd: - BigQuery choice - We are coupled to BigQuery, a GCP specific tool that is not viable for Volcano, and would neither be for many other potential clients out there who are not a GCP shop. - BigQuery feels overkill for the data volume one would expect in greenfield banks, unless they have extremely ambitious growth targets. +- GCS buckets for files: + - Not a bad tool, just very specific to GCP. + - We would probably -## A simpler alternative +## Alternatives +Regarding the BigQuery choice: - Raise another postgres instance within the deployments. Use it to replace what BigQuery is being used for right now. - Tiny side note: perhaps we could leverage the same postgres instance we have for meltano and airflow data, just adding a new database there to act as DW. There are pros and cons to it. +- Or swap Bigquery with Snowflake -## Another alternative +Regarding the GCS buckets: +- We can do nothing for now, just rely on GCS buckets and wait for that to be a problem. +- Or we could start using Azure Blob Storage already if we know that will be what's needed for Azure. +- Or we could start using something like https://filesystem-spec.readthedocs.io/en/latest/ in our Python code to keep ourselves agnostic. -- Swap Bigquery with Snowflake - -## Comparison +## On Postgres vs Snowflake Going for postgres: - Pros @@ -71,6 +87,11 @@ Going for Snowflake: - Although it's very popular in the data/BI niche, is far from being as popular as Postgres. Connectors and integrations with other tools might not always exist. - EL jobs from the app will always need to be driven by external tools like Meltano. Snowflake states that they can do CDC replication (https://docs.snowflake.com/en/connectors/postgres6/configure-replication), but we would need to verify if that is as performant as Postgres to Postgres replication. +## On the need for a highly scalable DW + +- Potential customers who are existing large orgs will already have a DW/Data Platform. I don't think they expect us to run that for them, but rather will have a big interest in Lana being easy to extract data from. +- Small, greenfield projects around banks that start out can probably survive, data volume wise, with Postgres for some time. Unless they expect to have millions of events per day super early, we would probably not need a highly scalable DW like BQ/Snowflake at the get go. + ### Chat with Jose