This commit is contained in:
counterweight 2025-06-29 11:57:50 +02:00
parent 3805399476
commit 3bc34af589
Signed by: counterweight
GPG key ID: 883EDBAA726BD96C
7 changed files with 5845 additions and 0 deletions

View file

@ -132,3 +132,89 @@ First bits I'll explore before moving on:
* Get more familiar with how the tests are written.
I'll clear those first and then take it from there.
### First steps: getting env running and doing a bit of research
#### Run the env
* Installing nix and running the dev shell was smooth, no hiccups.
* Running the dev deployment and the tests that seed some data also worked fine. I can reach the pg from the host. I see there are three different postgres instances and I'm accessing the `core-pg` one.
* I can see events in the tables, so we're all set.
#### The existing data
* There are two relevant tables: `core_users` and `core_users_events`. The first one is the aggregate current state snapshot, the other one contains the individual events.
* About `core_user_events`:
+ The PK is the id (which is FK to the `core_users` id) and the sequence.
+ It shows the `event_type` in a column and has the event details in JSONB in `event`.
+ It only has one timestamp called `recorded_at`.
* About `core_users`:
+ The PK is the user id.
#### Make sense out of the repo
* The user entity is defined in `core/access/src/user/`.
* I deduced there can be four events:
+ `Initialized`
+ `AuthenticationIdUpdated`
+ `RoleGranted`
+ `RoleRevoked`
* There is a schema definition file at `lana/entity-rollups/schemas/user_event_schema.json`.
+ It holds definitions for all four events, including the fields, their data types and which fields are mandatory. It also specifies which value of the (event)`type` enum they can hold, which tells apart what is the event.
+ There's a common subobject to all four events called `AuditInfo`. This is defined separately and referenced in each event's schema.
+ Looking at other schemas beyond the `user` ones, it seems the `AuditInfo` bit is repeated in all of them. I'm wondering why there isn't a common definition across all of them? I would guess maybe nesting another level of file referencing is a hassle with the tooling we are using.
* I've managed to run unit and e2e tests using the readme instructions. I decided to not run the app tests since that seems quite outside my current scope.
* That was nice and magic, but how they work? I'm going to take a look at the make file.
* I couldn't make sense of unit tests, so I jumped into the e2e because I think they are more relevant for the task.
* I could easily find all the tests are defined in the `bats` folder as this collection of bash scripts with reference to some common bits on the `helpers.bash`. The one that seems to be dealing with users is called `superuser.bats`.
* The test basically interact with lana via `curl`-ing the graphql. It creates users and after checks that they have the right properties.
* I want to dig deeper and check whether the graphql API is reading from the rolledup aggregates (I'm assuming that's the case). If so, I guess my code could make a new aggregate and I could include in my PR extending the e2e tests to test I'm rolling up things properly.
+ Okay, there's this file in `lana/admin-server/src/graphql/schema.rs` that has a monster object called **`mutation`******
+ It uses a macro called `exec_mutation` defined in `lana/admin-server/src/graphql/macros.rs`. It seems to be a chokepoint for persistence.
+ The file `repo.rs` in `lana/core/access/src/user/` defines the `User` entity and its columns. Seems like a duplicate from the JSON schema I found earlier? Perhaps that JSON schema gets generated from this code?
- Right, I found this recent PR in `es_entity`: https://github.com/GaloyMoney/cala/pull/444/files
- It makes `es_entity` generate JSON Schema from the defined entities.
+ It all uses `es_entity`, Justin's own Event Sourcing framework (lol I'm so out of my league here) (https://github.com/GaloyMoney/cala/tree/17060a3157b879a40676dc381cdba30e7b5a0010/lib/es-entity).
+ I can't make sense of how the entity table gets filled in.
* `lana/admin-server/entity-rollups/src/main.rs` is a script that updates the json schema files.
+ It looks for breaking changes in the schemas when it runs and cancels the writting of the schemas if a breaking change is detected.
+ The script can be executed by calling the `update-schemas` command defined in the Makefile.
* It's funky because there is one migration file that defines all aggregate and event tables. And it definetely doesn't look auto-generated. Could there be all of these house of cards in Rust, and then you need to manually make things match on Postgres by rolling the table definition yourself? I'm going to look at the git blame of the file to check past PRs and confirm.
* ChatGPT told me whenever one of the aggregates/entities gets mutated, the current code updates atomically the snapshot and the events table. So the app is kind of playing event sourcing and traditional normalized CRUD model at the same time. Now I'm kind of getting a hunch on why Justin is interested in doing the rollups in Postgres.
### Action time
Okay, I've fucked around a decent bit around the repo. Time to shoot a bit or I'll go crazy.
High level overview of milestones:
* Build the naive rolling up of the user events
+ Add the target table in a migration
+ Add the trigger and function
+ Piggy ride the bats test to check that my rollup works as intended
* Generate the `CREATE TABLE` statement for the rollup dynamically from the existing definitions in the repo instead of hardcoding it.
* Generate the trigger function dynamically from the event definitions.
* Think on how this could be generalized to other events.
* Think on how to deal with the evolution of aggregates and events
+ New event to modify initially immutable property
+ New fields
+ Breaking changes?
Issues:
* No strict definition of the entity itself. Fields must be derived from events and what they perform.
* I see you guys don't have a `role` column in `core_users` because the code itself doesn't do stuff
A few notes during execution:
* I'm assuming a user can only have one role. Would do things differently if roles where stackable.
* I'm also assuming that revoking a role leaves the user with no role and also that it doesn't require specifying the role to be revoked, as I see in `lana/app/tests/user.rs/` `bank_manager_lifecycle`.
+ Which is kind of confirmed in the tests in `core/access/src/user/entity.rs`.
+ Even though the definition of the `UserEvent` enum in the same file does mention that `RoleRevoked` does require a `RoleId`. So does the `user_event_schema.json` entry for the event (which makes sense since that JSON is derived from `UserEvent` in the first place).
+ This seems to confirm you don't want to have to specify an ID when revoking, even though what I find in the code is sort of contradictory: https://github.com/GaloyMoney/lana-bank/pull/2094
### Later / to-dos
* It would be worth looking at the staging layer of the dbt project to understand how these event aggregates are being integrated right now into Bigquery.
+ Sometimes they get the events, sometimes they get the aggregate, sometimes they get both.