counterweight/hojas

Fork 0

counterweight 3bc34af589

stuff

2025-06-29 11:57:50 +02:00

11 KiB

Raw Blame History

Challenge Scratch

My notes for the technical challenge.

Original request from Justin

Link to the gist: https://gist.github.com/bodymindarts/9919abc26a1dbbcd71d426030000df62

Write a trigger function that 'rolls up' the customer events in PG

We would like to ask you to work on a POC for a solution to the 'event sourced entities' issue I explained in the interview.

The approach we would like you to explore is to write a postgres trigger function that creates / updates a rollup of the entity events and gets committed atomically.

In this task we will focus only on the 'user' entities.

Anytime an event gets added to core_user_events table the PG function should atomically write to the rollup table with the data representing the current snapshot of a user.

Please commit a working example to a fork of our lana repository and send us a link to it when you are ready.

In a follow up interview we will dive into the approach you took and what thoughts you have around topics such as maintainability and extendability.

The following are instructions on how to get the system into a usefull state for experimenting:

Dependencies

Nix:

Recommended install method using https://github.com/DeterminateSystems/nix-installer

> curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
> ```

>  
> docker
>  
> ## Dev shell
>  
> Clone the repo: `git clone git@github.com:GaloyMoney/lana-bank.git && cd lana-bank`
>  
> Use `nix develop` to enter the dev shell or `direnv allow` if you have direnv installed - this will install all the dev dependencies you should need.
>  
>  
> ## Run tests
>  
> start dependencies:
>

make reset-deps run-server

Compilation can take a while but eventually you should see the following logs:

> Starting customer server on port 5254
> Starting admin server on port 5253
> ```

>  
> Run the 'superuser' end to end tests in another shell:
>  
>

bats -t bats/superuser.bats

This will seed the database with some user events. You can see the result via:

> $ psql postgres://user:password@localhost:5433/pg -c 'select count(*) from core_user_events;'
>  count
> -------
>      6
> (1 row)
> ```

>  
> ## Solution
>  
> Create a new migration file via:
>

cd lana/app cargo sqlx migrate add

you can add / remove the migration via

> cd lana/app
> cargo sqlx migrate run
> cargo sqlx migrate revert

rerun make reset-deps run-server to wipe the database state.

We are looking forward to your ideas - happy coding!

My notes

Understanding the request

We need to modify the project so that we materialize the state of users on each user event.
This needs to be done with a postgres trigger.
Our changes in Postgres must be delivered with a new migration.
The whole new state of the repo must be in a fork in Github which I'll send over.

Interesting bits:

How to ensure atomicity of rollup state (no way for an event to be added without the )
How to possible create the rollup in a lana instance that is already rolling and has events? (I consider this to be out of scope in terms of implementing, but I should have an opinion)
How to do something that's not hardcoded to the user events, but rather is extensible to other events? Potentially leverage some schema definitions to dynamically generate Postgres triggers? Maybe we need some artifact where we declare which events must be rolled up and add some config needed to do so? I'll probably begin with a naive, hardcoded approach to users and then reflect on how that can be extended to other aggregates.
I'll try to understand the testing of the project to see where can I add tests to ensure that my rollup works fine.

First bits I'll explore before moving on:

Getting the environment working.
Run the tests.
Get more familiar with the user events and how schemas are defined and used in the repo.
Get more familiar with how the tests are written.

I'll clear those first and then take it from there.

First steps: getting env running and doing a bit of research

Run the env

Installing nix and running the dev shell was smooth, no hiccups.
Running the dev deployment and the tests that seed some data also worked fine. I can reach the pg from the host. I see there are three different postgres instances and I'm accessing the core-pg one.
I can see events in the tables, so we're all set.

The existing data

There are two relevant tables: core_users and core_users_events. The first one is the aggregate current state snapshot, the other one contains the individual events.
About core_user_events:
- The PK is the id (which is FK to the core_users id) and the sequence.
- It shows the event_type in a column and has the event details in JSONB in event.
- It only has one timestamp called recorded_at.
About core_users:
- The PK is the user id.

Make sense out of the repo

The user entity is defined in core/access/src/user/.
I deduced there can be four events:
- Initialized
- AuthenticationIdUpdated
- RoleGranted
- RoleRevoked
There is a schema definition file at lana/entity-rollups/schemas/user_event_schema.json.
- It holds definitions for all four events, including the fields, their data types and which fields are mandatory. It also specifies which value of the (event)type enum they can hold, which tells apart what is the event.
- There's a common subobject to all four events called AuditInfo. This is defined separately and referenced in each event's schema.
- Looking at other schemas beyond the user ones, it seems the AuditInfo bit is repeated in all of them. I'm wondering why there isn't a common definition across all of them? I would guess maybe nesting another level of file referencing is a hassle with the tooling we are using.
I've managed to run unit and e2e tests using the readme instructions. I decided to not run the app tests since that seems quite outside my current scope.
That was nice and magic, but how they work? I'm going to take a look at the make file.
I couldn't make sense of unit tests, so I jumped into the e2e because I think they are more relevant for the task.
I could easily find all the tests are defined in the bats folder as this collection of bash scripts with reference to some common bits on the helpers.bash. The one that seems to be dealing with users is called superuser.bats.
The test basically interact with lana via curl-ing the graphql. It creates users and after checks that they have the right properties.
I want to dig deeper and check whether the graphql API is reading from the rolledup aggregates (I'm assuming that's the case). If so, I guess my code could make a new aggregate and I could include in my PR extending the e2e tests to test I'm rolling up things properly.
- Okay, there's this file in lana/admin-server/src/graphql/schema.rs that has a monster object called mutation****
- It uses a macro called exec_mutation defined in lana/admin-server/src/graphql/macros.rs. It seems to be a chokepoint for persistence.
- The file repo.rs in lana/core/access/src/user/ defines the User entity and its columns. Seems like a duplicate from the JSON schema I found earlier? Perhaps that JSON schema gets generated from this code?
  - Right, I found this recent PR in es_entity: https://github.com/GaloyMoney/cala/pull/444/files
  - It makes es_entity generate JSON Schema from the defined entities.
- It all uses es_entity, Justin's own Event Sourcing framework (lol I'm so out of my league here) (17060a3157/lib/es-entity).
- I can't make sense of how the entity table gets filled in.
lana/admin-server/entity-rollups/src/main.rs is a script that updates the json schema files.
- It looks for breaking changes in the schemas when it runs and cancels the writting of the schemas if a breaking change is detected.
- The script can be executed by calling the update-schemas command defined in the Makefile.
It's funky because there is one migration file that defines all aggregate and event tables. And it definetely doesn't look auto-generated. Could there be all of these house of cards in Rust, and then you need to manually make things match on Postgres by rolling the table definition yourself? I'm going to look at the git blame of the file to check past PRs and confirm.
ChatGPT told me whenever one of the aggregates/entities gets mutated, the current code updates atomically the snapshot and the events table. So the app is kind of playing event sourcing and traditional normalized CRUD model at the same time. Now I'm kind of getting a hunch on why Justin is interested in doing the rollups in Postgres.

Action time

Okay, I've fucked around a decent bit around the repo. Time to shoot a bit or I'll go crazy.

High level overview of milestones:

Build the naive rolling up of the user events
- Add the target table in a migration
- Add the trigger and function
- Piggy ride the bats test to check that my rollup works as intended
Generate the CREATE TABLE statement for the rollup dynamically from the existing definitions in the repo instead of hardcoding it.
Generate the trigger function dynamically from the event definitions.
Think on how this could be generalized to other events.
Think on how to deal with the evolution of aggregates and events
- New event to modify initially immutable property
- New fields
- Breaking changes?

Issues:

No strict definition of the entity itself. Fields must be derived from events and what they perform.
I see you guys don't have a role column in core_users because the code itself doesn't do stuff

A few notes during execution:

I'm assuming a user can only have one role. Would do things differently if roles where stackable.
I'm also assuming that revoking a role leaves the user with no role and also that it doesn't require specifying the role to be revoked, as I see in lana/app/tests/user.rs/ bank_manager_lifecycle.
- Which is kind of confirmed in the tests in core/access/src/user/entity.rs.
- Even though the definition of the UserEvent enum in the same file does mention that RoleRevoked does require a RoleId. So does the user_event_schema.json entry for the event (which makes sense since that JSON is derived from UserEvent in the first place).
- This seems to confirm you don't want to have to specify an ID when revoking, even though what I find in the code is sort of contradictory: https://github.com/GaloyMoney/lana-bank/pull/2094

Later / to-dos

It would be worth looking at the staging layer of the dbt project to understand how these event aggregates are being integrated right now into Bigquery.
- Sometimes they get the events, sometimes they get the aggregate, sometimes they get both.

11 KiB Raw Blame History