11 KiB
Challenge Scratch
My notes for the technical challenge.
Original request from Justin
Link to the gist: https://gist.github.com/bodymindarts/9919abc26a1dbbcd71d426030000df62
Write a trigger function that 'rolls up' the customer events in PG
We would like to ask you to work on a POC for a solution to the 'event sourced entities' issue I explained in the interview.
The approach we would like you to explore is to write a postgres trigger function that creates / updates a rollup of the entity events and gets committed atomically.
In this task we will focus only on the 'user' entities.
Anytime an event gets added to
core_user_eventstable the PG function should atomically write to the rollup table with the data representing the current snapshot of a user.Please commit a working example to a fork of our lana repository and send us a link to it when you are ready.
In a follow up interview we will dive into the approach you took and what thoughts you have around topics such as maintainability and extendability.
The following are instructions on how to get the system into a usefull state for experimenting:
Dependencies
Nix:
- Recommended install method using https://github.com/DeterminateSystems/nix-installer
> curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
> ```
>
> docker
>
> ## Dev shell
>
> Clone the repo: `git clone git@github.com:GaloyMoney/lana-bank.git && cd lana-bank`
>
> Use `nix develop` to enter the dev shell or `direnv allow` if you have direnv installed - this will install all the dev dependencies you should need.
>
>
> ## Run tests
>
> start dependencies:
>
make reset-deps run-server
Compilation can take a while but eventually you should see the following logs:
> Starting customer server on port 5254
> Starting admin server on port 5253
> ```
>
> Run the 'superuser' end to end tests in another shell:
>
>
bats -t bats/superuser.bats
This will seed the database with some user events. You can see the result via:
> $ psql postgres://user:password@localhost:5433/pg -c 'select count(*) from core_user_events;'
> count
> -------
> 6
> (1 row)
> ```
>
> ## Solution
>
> Create a new migration file via:
>
cd lana/app cargo sqlx migrate add
you can add / remove the migration via
> cd lana/app
> cargo sqlx migrate run
> cargo sqlx migrate revert
rerun
make reset-deps run-serverto wipe the database state.We are looking forward to your ideas - happy coding!
My notes
Understanding the request
- We need to modify the project so that we materialize the state of users on each user event.
- This needs to be done with a postgres trigger.
- Our changes in Postgres must be delivered with a new migration.
- The whole new state of the repo must be in a fork in Github which I'll send over.
Interesting bits:
- How to ensure atomicity of rollup state (no way for an event to be added without the )
- How to possible create the rollup in a lana instance that is already rolling and has events? (I consider this to be out of scope in terms of implementing, but I should have an opinion)
- How to do something that's not hardcoded to the user events, but rather is extensible to other events? Potentially leverage some schema definitions to dynamically generate Postgres triggers? Maybe we need some artifact where we declare which events must be rolled up and add some config needed to do so? I'll probably begin with a naive, hardcoded approach to users and then reflect on how that can be extended to other aggregates.
- I'll try to understand the testing of the project to see where can I add tests to ensure that my rollup works fine.
First bits I'll explore before moving on:
- Getting the environment working.
- Run the tests.
- Get more familiar with the user events and how schemas are defined and used in the repo.
- Get more familiar with how the tests are written.
I'll clear those first and then take it from there.
First steps: getting env running and doing a bit of research
Run the env
- Installing nix and running the dev shell was smooth, no hiccups.
- Running the dev deployment and the tests that seed some data also worked fine. I can reach the pg from the host. I see there are three different postgres instances and I'm accessing the
core-pgone. - I can see events in the tables, so we're all set.
The existing data
- There are two relevant tables:
core_usersandcore_users_events. The first one is the aggregate current state snapshot, the other one contains the individual events. - About
core_user_events:- The PK is the id (which is FK to the
core_usersid) and the sequence. - It shows the
event_typein a column and has the event details in JSONB inevent. - It only has one timestamp called
recorded_at.
- The PK is the id (which is FK to the
- About
core_users:- The PK is the user id.
Make sense out of the repo
-
The user entity is defined in
core/access/src/user/. -
I deduced there can be four events:
InitializedAuthenticationIdUpdatedRoleGrantedRoleRevoked
-
There is a schema definition file at
lana/entity-rollups/schemas/user_event_schema.json.- It holds definitions for all four events, including the fields, their data types and which fields are mandatory. It also specifies which value of the (event)
typeenum they can hold, which tells apart what is the event. - There's a common subobject to all four events called
AuditInfo. This is defined separately and referenced in each event's schema. - Looking at other schemas beyond the
userones, it seems theAuditInfobit is repeated in all of them. I'm wondering why there isn't a common definition across all of them? I would guess maybe nesting another level of file referencing is a hassle with the tooling we are using.
- It holds definitions for all four events, including the fields, their data types and which fields are mandatory. It also specifies which value of the (event)
-
I've managed to run unit and e2e tests using the readme instructions. I decided to not run the app tests since that seems quite outside my current scope.
-
That was nice and magic, but how they work? I'm going to take a look at the make file.
-
I couldn't make sense of unit tests, so I jumped into the e2e because I think they are more relevant for the task.
-
I could easily find all the tests are defined in the
batsfolder as this collection of bash scripts with reference to some common bits on thehelpers.bash. The one that seems to be dealing with users is calledsuperuser.bats. -
The test basically interact with lana via
curl-ing the graphql. It creates users and after checks that they have the right properties. -
I want to dig deeper and check whether the graphql API is reading from the rolledup aggregates (I'm assuming that's the case). If so, I guess my code could make a new aggregate and I could include in my PR extending the e2e tests to test I'm rolling up things properly.
- Okay, there's this file in
lana/admin-server/src/graphql/schema.rsthat has a monster object calledmutation**** - It uses a macro called
exec_mutationdefined inlana/admin-server/src/graphql/macros.rs. It seems to be a chokepoint for persistence. - The file
repo.rsinlana/core/access/src/user/defines theUserentity and its columns. Seems like a duplicate from the JSON schema I found earlier? Perhaps that JSON schema gets generated from this code?- Right, I found this recent PR in
es_entity: https://github.com/GaloyMoney/cala/pull/444/files - It makes
es_entitygenerate JSON Schema from the defined entities.
- Right, I found this recent PR in
- It all uses
es_entity, Justin's own Event Sourcing framework (lol I'm so out of my league here) (17060a3157/lib/es-entity). - I can't make sense of how the entity table gets filled in.
- Okay, there's this file in
-
lana/admin-server/entity-rollups/src/main.rsis a script that updates the json schema files.- It looks for breaking changes in the schemas when it runs and cancels the writting of the schemas if a breaking change is detected.
- The script can be executed by calling the
update-schemascommand defined in the Makefile.
-
It's funky because there is one migration file that defines all aggregate and event tables. And it definetely doesn't look auto-generated. Could there be all of these house of cards in Rust, and then you need to manually make things match on Postgres by rolling the table definition yourself? I'm going to look at the git blame of the file to check past PRs and confirm.
-
ChatGPT told me whenever one of the aggregates/entities gets mutated, the current code updates atomically the snapshot and the events table. So the app is kind of playing event sourcing and traditional normalized CRUD model at the same time. Now I'm kind of getting a hunch on why Justin is interested in doing the rollups in Postgres.
Action time
Okay, I've fucked around a decent bit around the repo. Time to shoot a bit or I'll go crazy.
High level overview of milestones:
- Build the naive rolling up of the user events
- Add the target table in a migration
- Add the trigger and function
- Piggy ride the bats test to check that my rollup works as intended
- Generate the
CREATE TABLEstatement for the rollup dynamically from the existing definitions in the repo instead of hardcoding it. - Generate the trigger function dynamically from the event definitions.
- Think on how this could be generalized to other events.
- Think on how to deal with the evolution of aggregates and events
- New event to modify initially immutable property
- New fields
- Breaking changes?
Issues:
- No strict definition of the entity itself. Fields must be derived from events and what they perform.
- I see you guys don't have a
rolecolumn incore_usersbecause the code itself doesn't do stuff
A few notes during execution:
- I'm assuming a user can only have one role. Would do things differently if roles where stackable.
- I'm also assuming that revoking a role leaves the user with no role and also that it doesn't require specifying the role to be revoked, as I see in
lana/app/tests/user.rs/bank_manager_lifecycle.- Which is kind of confirmed in the tests in
core/access/src/user/entity.rs. - Even though the definition of the
UserEventenum in the same file does mention thatRoleRevokeddoes require aRoleId. So does theuser_event_schema.jsonentry for the event (which makes sense since that JSON is derived fromUserEventin the first place). - This seems to confirm you don't want to have to specify an ID when revoking, even though what I find in the code is sort of contradictory: https://github.com/GaloyMoney/lana-bank/pull/2094
- Which is kind of confirmed in the tests in
Later / to-dos
- It would be worth looking at the staging layer of the dbt project to understand how these event aggregates are being integrated right now into Bigquery.
- Sometimes they get the events, sometimes they get the aggregate, sometimes they get both.