stuff
This commit is contained in:
parent
97d27ac539
commit
0e7af36270
1 changed files with 26 additions and 5 deletions
31
log.md
31
log.md
|
|
@ -1,6 +1,16 @@
|
||||||
# Log
|
# Log
|
||||||
|
|
||||||
|
|
||||||
|
## 2025-07-30
|
||||||
|
|
||||||
|
### Chat with Andrej
|
||||||
|
|
||||||
|
- traveling around central america after covid.
|
||||||
|
- freelance
|
||||||
|
- used to work in a marketing agency
|
||||||
|
- working for galoy since 2022
|
||||||
|
- Located in Berlin in El Salvador
|
||||||
|
|
||||||
## 2025-07-29
|
## 2025-07-29
|
||||||
|
|
||||||
### Summary for ACE tomorrow
|
### Summary for ACE tomorrow
|
||||||
|
|
@ -36,17 +46,23 @@ Odd:
|
||||||
- BigQuery choice
|
- BigQuery choice
|
||||||
- We are coupled to BigQuery, a GCP specific tool that is not viable for Volcano, and would neither be for many other potential clients out there who are not a GCP shop.
|
- We are coupled to BigQuery, a GCP specific tool that is not viable for Volcano, and would neither be for many other potential clients out there who are not a GCP shop.
|
||||||
- BigQuery feels overkill for the data volume one would expect in greenfield banks, unless they have extremely ambitious growth targets.
|
- BigQuery feels overkill for the data volume one would expect in greenfield banks, unless they have extremely ambitious growth targets.
|
||||||
|
- GCS buckets for files:
|
||||||
|
- Not a bad tool, just very specific to GCP.
|
||||||
|
- We would probably
|
||||||
|
|
||||||
## A simpler alternative
|
## Alternatives
|
||||||
|
|
||||||
|
Regarding the BigQuery choice:
|
||||||
- Raise another postgres instance within the deployments. Use it to replace what BigQuery is being used for right now.
|
- Raise another postgres instance within the deployments. Use it to replace what BigQuery is being used for right now.
|
||||||
- Tiny side note: perhaps we could leverage the same postgres instance we have for meltano and airflow data, just adding a new database there to act as DW. There are pros and cons to it.
|
- Tiny side note: perhaps we could leverage the same postgres instance we have for meltano and airflow data, just adding a new database there to act as DW. There are pros and cons to it.
|
||||||
|
- Or swap Bigquery with Snowflake
|
||||||
|
|
||||||
## Another alternative
|
Regarding the GCS buckets:
|
||||||
|
- We can do nothing for now, just rely on GCS buckets and wait for that to be a problem.
|
||||||
|
- Or we could start using Azure Blob Storage already if we know that will be what's needed for Azure.
|
||||||
|
- Or we could start using something like https://filesystem-spec.readthedocs.io/en/latest/ in our Python code to keep ourselves agnostic.
|
||||||
|
|
||||||
- Swap Bigquery with Snowflake
|
## On Postgres vs Snowflake
|
||||||
|
|
||||||
## Comparison
|
|
||||||
|
|
||||||
Going for postgres:
|
Going for postgres:
|
||||||
- Pros
|
- Pros
|
||||||
|
|
@ -71,6 +87,11 @@ Going for Snowflake:
|
||||||
- Although it's very popular in the data/BI niche, is far from being as popular as Postgres. Connectors and integrations with other tools might not always exist.
|
- Although it's very popular in the data/BI niche, is far from being as popular as Postgres. Connectors and integrations with other tools might not always exist.
|
||||||
- EL jobs from the app will always need to be driven by external tools like Meltano. Snowflake states that they can do CDC replication (https://docs.snowflake.com/en/connectors/postgres6/configure-replication), but we would need to verify if that is as performant as Postgres to Postgres replication.
|
- EL jobs from the app will always need to be driven by external tools like Meltano. Snowflake states that they can do CDC replication (https://docs.snowflake.com/en/connectors/postgres6/configure-replication), but we would need to verify if that is as performant as Postgres to Postgres replication.
|
||||||
|
|
||||||
|
## On the need for a highly scalable DW
|
||||||
|
|
||||||
|
- Potential customers who are existing large orgs will already have a DW/Data Platform. I don't think they expect us to run that for them, but rather will have a big interest in Lana being easy to extract data from.
|
||||||
|
- Small, greenfield projects around banks that start out can probably survive, data volume wise, with Postgres for some time. Unless they expect to have millions of events per day super early, we would probably not need a highly scalable DW like BQ/Snowflake at the get go.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Chat with Jose
|
### Chat with Jose
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue