galoy-personal-notes/log.md

# Log

## 2025-08-13

### Meeting with Luis

- We sit down to discuss UIF:
    - Report docs: (https://drive.google.com/drive/folders/17dhJdF5cUDT50g4-FhAEaonqatH8SMzR)
    - Law: https://drive.google.com/file/d/1gQ8F96C_kmJe7oLvXxML7CODjPE32zun/view
- Open topics
    - Does the documentation in these files cover 100% of what we need to do, UIF wise? Or should we add anything in there?
    - Are these docs up to date?
    - What does "having" these reports mean? Is file generation good enough for an audit? Do we need to worry about integrating with UIF systems earlier than launch?
    - Are there any XSD validation files somewhere, like the ones from the Central de Riesgos? 
        - No.
        - We should build it.
    - Confirm that the Excel templates and the PDF forms are just illutrative and not really relevant for our case
- Propose that I build an inventory of what needs to be built and how it could be reviewed before going crazy with implementing, and that we all agree on it (including Luis)
- best way to communicate

Also generate in Excel so they can be reviewed by Marcos

we are missing transfer365 report details, do we care

reportes de efectivo -> we don't need to do since we don't do cash

otros medios -> cheques, debitos, etc

otros medios electronicos is the only applicable 

we need to clarify tipo de producto

Oficial Cumplimiento -> Marco (quien es Marco?)
    - actividad economica -> uses a different catalogue than the one used for KYC, we need to develop a mapping

you usually need to report the teller's identity. In our case, we can add a generic "banca electronica" identity. Unless it's a "manual", accounting driven (not UI driven) operation, in which case we would need the "oficial del banco" identity. We will need to add a field for usernames, since we right now only have email.

Do we need to create the human readable forms?

Should we check absolutely any transaction above the thresholds

Reporting suerly applies to the current account transactions, but not to the collateral account. We should confirm with Marcos. Perhaps on liquidation?


#### Clean notes after meeting

- On which reports we need to do:
    - We have confirmed with Luis that we need to generate the 5 reports here (https://github.com/GaloyMoney/knowledge-base/pull/14/files).
        - Yet it seems only `07 UIF Método Reporte Diario de Otros Medios Electrónicos.pdf` applies to us given that all of our operations are digital.
        - `03 UIF Método Reporte Diario de Efectivo` and `04 UIF Método Reporte Mensual de Efectivo` only apply to physical cash transactions, so they don't apply to Volcano since it doesn't handle cash.
        - `05 UIF Método Reporte Diario de Otros Medios` and `06 UIF Método Reporte Mensual de Otros Medios` apply to other bank methods such as checks.
    - He also mentioned that there is a sixth report, not listed there, related to the transfer365 payment rail. He doesn't have details on that report at the moment, but thinks it doesn't apply to us since we don't do transfer365 transactions.
- On how we need to deliver it:
    - The end stage in production is to integrate with the UIF systems via XML delivery, but this will only happen when we're actually launching.
    - For the current stage and to satisfy the audit, we will simply produce files.
    - Technically, we need to produce XML files as described by the UIF documentation. But we will also produce CSV-structured exports that are easy to consume by humans so that auditors and regulators can easily check the info (this has been explicitly recommended by Luis).
- On validation:
    - We are not aware of the existence of any `xsd` files that we can use to validate the XML files that we must deliver to the UIF. I would propose building them ourselves according to their spec so we have something to validate against.
- On the applicability to collateral accounts:
    - Luis considers that the transactions on the collateral account don't need to be reported, since they are not really change of ownership but rather just the delivery of a collateral. He equates it to how setting a house as collateral for a mortgage doesn't trigger any reporting to the UIF.
- On identifiying parties:
    - When reporting a transaction, the details of everyone involved must be reported. This means that if a Volcano client receives a bank transfer of some third party to his USD account, we would need to have the personal details of that third party.
    - Given that we don't currently have any way to collect that, we need to either:
        - Expand `lana-bank` features to be able to do that.
        - Or simply have the convention that customers can only send/receive USD from other accounts under their name.
- On identifying tellers:
    - The reports expect us to inform who in the bank handles the transaction.
    - Given that our operations are driven by a digital app, Luis suggests that we simply use some "Electronic Banking" generic identity, since generally there is no human at the wheel.
    - But if we make transactions manually, for instance by having a Volcano employee do a transfer between different customer accounts in the bank manually via accounting, then we would be expected to inform the identity of the particular employee who did that.
- On the professional activity mapping:
    - The UIF has a different taxonomy for customer profession that the one used in KYC. This needs to be added to the customer data in the transaction reports.
    - We need a mapping between the KYC taxonomy and the UIF one so we can translate that in our reports and send the UIF the codes they expect. Either Luis gets it from somewhere or we painfully build it ourselves.

Open actions:
- Confirm to a 100% that we only need to do report `07 UIF Método Reporte Diario de Otros Medios Electrónicos.pdf`.
- Confirm if other reports need to simply be delivered empty, or not at all.
- Discuss with the team if we're comfortable with restricting USD transactioning with other accounts to accounts under the name of the same customer.
- Clarify where do we get the profession code mapping between KYC codes and UIF codes.
- Build the reports.


## 2025-08-08

### Feedback with Justin

Stuff I'm enjoying:
- Great engineering practices and culture
- First row seat on Bitcoin adoption
- International team

Stuff that feels sour:
- Confusion with urgency but not rushing?
- Timezones are a challenge
- For now I've been cautiously backing off, but I'll probably become a bit more noisy
- I would love to interact directly with the client when it makes sense


## 2025-07-30

### Chat with Andrej

- traveling around central america after covid.
- freelance
- used to work in a marketing agency
- working for galoy since 2022
- Located in Berlin in El Salvador

## 2025-07-29

### Summary for ACE tomorrow

While onboarding, I've inevitably looked at the data pipeline with fresh eyes, and it has made me observe some parts of the stack feel strange for the requirements we must satisfy with Lana.

This doc briefly lays out what parts of the stack I suggest we rethink.

## Brief review of current situation 

### Needs we are covering

Currently, we serve one very clear requirement with the data pipeline: building and delivering the UI generated report files.

Potentially, we could also think about having a "batteries" included BI attitude with Lana, where Lana gets bundled with an ingest+transform+present stack that allows a deployment to see some useful business reports on how the bank is doing. This is not a strict requirement for Volcano but rather a nice-to-have feature addition that is completely up to use. It's worth discussing if this makes sense depending on the chances that most of our customers would already have a Datawarehouse or wider data platform already running.

### How we are doing it

We:
- Ingest from Meltano into BigQuery
- We run SQL defined transformations to data in BigQuery with dbt.
- We generate report files via Python scripts, store them in GCS buckets and make them accessible through the UI.
- We orchestrate the whole thing with a mix of scheduled running and UI triggered actions.

## Parts that feel great, parts that feel odd

Great:
- Meltano as a choice to do the Extract&Load between the app db and any DW.
- dbt as a choice for managing the tangle of SQL defined transformations
- To not try to do this transformations in any way within the rust codebase.

Odd:
- BigQuery choice
    - We are coupled to BigQuery, a GCP specific tool that is not viable for Volcano, and would neither be for many other potential clients out there who are not a GCP shop.
    - BigQuery feels overkill for the data volume one would expect in greenfield banks, unless they have extremely ambitious growth targets.
- GCS buckets for files:
    - Not a bad tool, just very specific to GCP.
    - We would probably

## Alternatives

Regarding the BigQuery choice:
- Raise another postgres instance within the deployments. Use it to replace what BigQuery is being used for right now.
  - Tiny side note: perhaps we could leverage the same postgres instance we have for meltano and airflow data, just adding a new database there to act as DW. There are pros and cons to it.
- Or swap Bigquery with Snowflake

Regarding the GCS buckets:
- We can do nothing for now, just rely on GCS buckets and wait for that to be a problem.
- Or we could start using Azure Blob Storage already if we know that will be what's needed for Azure.
- Or we could start using something like https://filesystem-spec.readthedocs.io/en/latest/ in our Python code to keep ourselves agnostic.

## On Postgres vs Snowflake

Going for postgres:
- Pros
    - Familiar technology
    - We can include easily in CI jobs, tests, etc. "More control"
    - Probably the most popular db out there, hence pretty much all tooling has available connectors and integrations to it.
    - Opens up the possibility to do Extract and Load (EL) between app and DW with many of the available Postgres replication options and tools. Could be interesting if we ever face extremely low latency (under 1min) needs.
- Cons
    - We need to move from BQ to Postgres. Change all SQL, configs, deployment, etc.
    - It's not strictly built for DW needs. Will become challenging if some deployment scales a lot in data volume.
    - May not look flashy to corp IT mgmt who expect all the fancy tools.

Going for Snowflake:
- Pros
    - Allows us to deliver highly scalable DW (the big question here is: will we ever need to do that?).
    - Comes with a lot of additional tooling and goodies around the DW.
    - Makes us look serious (old fashioned IT mgmt at large corps probably feels more comfortable hearing that we use Snowflake than Postgres, even if it might be pointless or detrimental)
- Cons
    - We need to move from BQ to Snowflake. Change all SQL, configs, deployment, etc.
    - We need to plug ourselves into it for all CI and local env work. Might be more convoluted than simply raising a Postgres container.
    - We are coupling our data stack with one vendor. It's the top dog of its niche.
    - Although it's very popular in the data/BI niche, is far from being as popular as Postgres. Connectors and integrations with other tools might not always exist.
    - EL jobs from the app will always need to be driven by external tools like Meltano. Snowflake states that they can do CDC replication (https://docs.snowflake.com/en/connectors/postgres6/configure-replication), but we would need to verify if that is as performant as Postgres to Postgres replication.

## On the need for a highly scalable DW

- Potential customers who are existing large orgs will already have a DW/Data Platform. I don't think they expect us to run that for them, but rather will have a big interest in Lana being easy to extract data from.
- Small, greenfield projects around banks that start out can probably survive, data volume wise, with Postgres for some time. Unless they expect to have millions of events per day super early, we would probably not need a highly scalable DW like BQ/Snowflake at the get go.


### Chat with Jose

- Report generation through the UI
  - Is there any feature/value/thingie that we are delivering via the data stack other than the UI generated reports?

- Reporting
    - Could you give me some guidance on what could I pick

### More chat with Justin

- How the different containers talk to each other
- Airflow integration 
    - Reports are trigered from UI.
    - One button to generate all the reports
    - graphql mutation triggerReportRun
    - trigger report run
    - user must have permission
    - job 
    - airflow create a job
    - the reports batch as a unique id
    - then airflow gets called 
    - report api plugin flask
    - reports get dropped into buckets
    - we can query report by 

- meltano also defines a scheduled generation
  - are the scheduled generations visible from the UI? Yes
  - 

Three pg
- core -> run time 
    - all application data
    - modular monolith
    - clear partitionings, no dependency rules
    - the most important database
- kratos
    - IAM
    - we probably expect the customer to bring it's own 
    - core application has no auth 
    - we probably 

- data (meltano + airflow)
  - airflow


## 2025-07-28

### Catch-up with Justin

- Still would appreciate
    - An architecture overview of the app itself
    - Someone running me through the app as if I was 
- My next goals
    - E2E data run with Sebastien
    - Sit down with Jose like "point me to something more specific that I can do"
    - Drive architecture conversation


- Bigquery was the right call for the Blink DWH

- Set up a session with Sebastien and Jose
- 

UI run through


Config

Chart of accounts (Cala Stuff)
- ledger_account_id: CalaAccountId
- AccountSet (Rollup of multiple accounts) NOT THE SAME THING AS LANA Chart of Accounts

Deposit module depends on DepositLedger, which templates and writes into Cala ledger

Omnibus 


### Chat with Vaibhav

- Taking course in Spanish
- 2.5 years
    - Started as an intern
    - Summer of Bitcoin


## 2025-07-25

### Chat with Jose

- Intro
    - 4 years with Galoy
    - Originally from Bolivia, but grew in the US
- Who is abussutil, reach out cold?
    - Alexandre Bussutil
- Any conversations you think I should jump into?

- 6 or 7 PDFs that has all the questions
- start from the end and pull from there

## 2025-07-24

### All hands

ISO27001

### Daily 

I've been:
- Having coffees
- Trying to run pg2bq load project independently
- Discussion with Sebastien
    - Run on postgres
    - Run on snowflake
    - Run on anything?
- Completing more onboarding docs

### Chat with Siddharth

- Graduated last year
- Software Engineer
- First full time job, 2 years here
- Mostly frontend

## 2025-07-23

### Chat with Andrew

- Wallet of satoshi had better timing that Blink
Rhode Island, between Boston and NY
15 years in Consulting for big companies
10 years in San Diego
2019-2020 orange pilled
4 years in Galoy

Bitcoin is a life raft for banks

Regulated vs pirate

"Whatever needs to be done"
- Marketing, Sales, CM,
- Right now: Go to market
Spoke with 20 banks in the last 6 months

slow and steady, not fiat attitude

content marketing: blink blog was really good

### Chat with Nicolas

- I think I'm up to date with company goals, but maybe you want to give me your view?
- Manage audit vs data work

"One year ago we decide to throw all our eggs in one basket"
"Maybe in three months we have nothing to do"

Currently El Salvador only has a commercial banking law, doesn't have any investment banking law. Keeps getting pushed backed because of compliance with IMF. Law might be approved any time.

Teenage Sex

Opportunities to do something in the US with the team that started Silvergate

Before it was Fulgur and Tether, but Tether dropped out earlier this year.

- IBEX fun story

- I'm surprised by the length of the tenures

- About the 1 or 2 engineers per quarter comment you made, how come?

Vitaly, major shareholder of Bitfinex, 10K, 50K BTC
John Carlo, guy owning Tether

Get in the calls with Vicky ASAP

I like the Wild West, Godfather 2 in Cuba feeling of this project

### ACE

- We pick the "make me a new snapshot row everytime a entity state rollup happens" way.
- Docker Hub rate limiting: we decide to just leave the docker hub image for now and see if the rate limiting problem comes back or not


### Chat with Sebastien

- Japan since 5 years ago. 5 years in washington before.
- 8yo daughter that climbs, also has a 16yo daughter
- 4 years in Galoy
- Designed to Stablesats
- Equity derivatives at Lehman brothers and Morgan Stanley


Jose got kidnapped by language skills

### Ongoing 

10/365, daily interest 
Sandipan working on app-driven report generation 

Challenges
- Loosely, conversation driven requirements.
- Lack of testing approach to tell if things are right or not
- Trouble keeping up with all the breaking changes of the backend 
- Snowflake? Postgres? Anything else?


### Chat with Jiri

- Joined in April '25
- Backend, using Rust for the first time
- Used to work with Scala
- In Taiwan
- From Czech Republic
- Flying license

### Chat with Justin

- Rollup drama update

- Getting Bigquery set up


## 2025-07-22

### Daily call

- Meeting the team
- Soaking in info
    - Review Looms from the team
    - Reading through data requirements and dbt artifacts
        - https://docs.google.com/spreadsheets/d/1r2HEl4_VYJh4spajmiDz_A1DXAnfshPg/edit?gid=1297332161#gid=1297332161
        - https://www.bcr.gob.sv/
        - https://norms-llm-dylanwilson32.replit.app/
        - https://github.com/GaloyMoney/lana-bank/tree/main/meltano/transform/models/outputs/el-salvador-ssf
- Doing
    - Working out where to change in Terraform to get BigQuery env
    - Just managed to set up local env
        - Had some issues with nix
    - Filling in onboarding details


Onboarding stuff
    - I've started and merged this PR to onboarding (on-call): https://github.com/GaloyMoney/onboarding/pull/18
    - Started this one, still pending review (code-review): https://github.com/GaloyMoney/onboarding/pull/17

### Chat with Sebastien

From Sebastien

    If you haven't figured this out yet, the data flows like this:
    staging (stg_* files) -> intermediate (int_* files) -> output (misc. but often report_* files)
    Under staging, the 'rollups' folder hosts the "rolled up" source data from the backend, raw.
    Under intermediate, the 'rollups' folder hosts the expanded & type casted version of the above raw data and should be the source of all(most?) our transformations
    The backend is architecture'd to stream events for most "objects"/"entity" 
    (as concisely explained here https://www.youtube.com/watch?v=lg6aF5PP4Tc)
    and so the rollups are the snapshots of the state of those "objects"/"entity" as a one per row table (chronological reduce / event summarization) mentioned in the video around 4:20.
    The reduce process are backend side, done as triggers for now on "objects"/"entity" events table in PG and visible a sql migrations under lana/app/migrations/<date>_*_events_rollup.sqlI think you should be familiar with that process given the interview take home, but that's what we adopted for now and it's all automated...
    so if a field name changes in the backend for example, and the pipeline falls out of sync it breaks and we can address it, rather than silently failing.
    So anyways all the concepts of the bank and data we might be interested in analyzing, reporting, etc. are implicitly documented with the above 3 set of rollup sql.
    The credit facility object is probably the most interesting and easy to start with as it is a vanilla bank loan or dumbed down line of credit.

### Coffee with Kartik

- Where are you based?
    - Bangalore, been there for two years
- How long you've been around?
    - 5 years, started out when it was just Nicolas and him for engineering?
- What's your main focus?
    - Deployment and infrastructure, CI/CD pipelines
    - We should probably talk for unit testing stuff
- What were you doing before Galoy?
    - Uni
- Quite a rollercoaster, right?

### Coffee with Arvin

- Where are you based?
    - Trinidad
- How long you've been around?
    - July'21
- What's your main focus?
    - Mostly backend stuff
- What were you doing before Galoy?

Into Bitcoin for quite while. Lots of FOSS.
Big El Salvador focus back in the day.
Team was more operational back with Blink.
Split Blink, people could choose.

## 2025-07-21

### Call with Justin

About rotation:
- Rotation is organized in weekly pairs of devs. Each dev takes a 12 hour shift.
- The oncall dev is responsible for:
    - Being the first responder to any alerts that happen in production.
    - Triaging the alerts to judge the severity and whether end-users are being affected or not.
    - Either tackling the issue themselves or gathering the colleagues with the required knowledge whenever possible.

### Daily Call

There's a Kanban in Volcano WIP. Kind of Adhoc.