# Challenge Scratch My notes for the technical challenge. ## Original request from Justin Link to the gist: https://gist.github.com/bodymindarts/9919abc26a1dbbcd71d426030000df62 > # Write a trigger function that 'rolls up' the customer events in PG > > We would like to ask you to work on a POC for a solution to the 'event sourced entities' issue I explained in the interview. > > The approach we would like you to explore is to write a postgres trigger function that creates / updates a rollup of the entity events and gets committed atomically. > > In this task we will focus only on the 'user' entities. > > Anytime an event gets added to `core_user_events` table the PG function should atomically write to the rollup table with the data representing the current snapshot of a user. > > Please commit a working example to a fork of our lana repository and send us a link to it when you are ready. > > In a follow up interview we will dive into the approach you took and what thoughts you have around topics such as maintainability and extendability. > > The following are instructions on how to get the system into a usefull state for experimenting: > > ## Dependencies > > Nix: > - Recommended install method using https://github.com/DeterminateSystems/nix-installer > ``` > curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install > ``` > > docker > > ## Dev shell > > Clone the repo: `git clone git@github.com:GaloyMoney/lana-bank.git && cd lana-bank` > > Use `nix develop` to enter the dev shell or `direnv allow` if you have direnv installed - this will install all the dev dependencies you should need. > > > ## Run tests > > start dependencies: > ``` > make reset-deps run-server > ``` > > Compilation can take a while but eventually you should see the following logs: > ``` > Starting customer server on port 5254 > Starting admin server on port 5253 > ``` > > Run the 'superuser' end to end tests in another shell: > > ``` > bats -t bats/superuser.bats > ``` > > This will seed the database with some user events. > You can see the result via: > > ``` > $ psql postgres://user:password@localhost:5433/pg -c 'select count(*) from core_user_events;' > count > ------- > 6 > (1 row) > ``` > > ## Solution > > Create a new migration file via: > ``` > cd lana/app > cargo sqlx migrate add > ``` > > you can add / remove the migration via > ``` > cd lana/app > cargo sqlx migrate run > cargo sqlx migrate revert ``` > > rerun `make reset-deps run-server` to wipe the database state. > > > We are looking forward to your ideas - happy coding! ## My notes ### Understanding the request * We need to modify the project so that we materialize the state of users on each user event. * This needs to be done with a postgres trigger. * Our changes in Postgres must be delivered with a new migration. * The whole new state of the repo must be in a fork in Github which I'll send over. Interesting bits: * How to ensure atomicity of rollup state (no way for an event to be added without the ) * How to possible create the rollup in a lana instance that is already rolling and has events? (I consider this to be out of scope in terms of implementing, but I should have an opinion) * How to do something that's not hardcoded to the user events, but rather is extensible to other events? Potentially leverage some schema definitions to dynamically generate Postgres triggers? Maybe we need some artifact where we declare which events must be rolled up and add some config needed to do so? I'll probably begin with a naive, hardcoded approach to users and then reflect on how that can be extended to other aggregates. * I'll try to understand the testing of the project to see where can I add tests to ensure that my rollup works fine. First bits I'll explore before moving on: * Getting the environment working. * Run the tests. * Get more familiar with the user events and how schemas are defined and used in the repo. * Get more familiar with how the tests are written. I'll clear those first and then take it from there. ### First steps: getting env running and doing a bit of research #### Run the env * Installing nix and running the dev shell was smooth, no hiccups. * Running the dev deployment and the tests that seed some data also worked fine. I can reach the pg from the host. I see there are three different postgres instances and I'm accessing the `core-pg` one. * I can see events in the tables, so we're all set. #### The existing data * There are two relevant tables: `core_users` and `core_users_events`. The first one is the aggregate current state snapshot, the other one contains the individual events. * About `core_user_events`: + The PK is the id (which is FK to the `core_users` id) and the sequence. + It shows the `event_type` in a column and has the event details in JSONB in `event`. + It only has one timestamp called `recorded_at`. * About `core_users`: + The PK is the user id. #### Make sense out of the repo * The user entity is defined in `core/access/src/user/`. * I deduced there can be four events: + `Initialized` + `AuthenticationIdUpdated` + `RoleGranted` + `RoleRevoked` * There is a schema definition file at `lana/entity-rollups/schemas/user_event_schema.json`. + It holds definitions for all four events, including the fields, their data types and which fields are mandatory. It also specifies which value of the (event)`type` enum they can hold, which tells apart what is the event. + There's a common subobject to all four events called `AuditInfo`. This is defined separately and referenced in each event's schema. + Looking at other schemas beyond the `user` ones, it seems the `AuditInfo` bit is repeated in all of them. I'm wondering why there isn't a common definition across all of them? I would guess maybe nesting another level of file referencing is a hassle with the tooling we are using. * I've managed to run unit and e2e tests using the readme instructions. I decided to not run the app tests since that seems quite outside my current scope. * That was nice and magic, but how they work? I'm going to take a look at the make file. * I couldn't make sense of unit tests, so I jumped into the e2e because I think they are more relevant for the task. * I could easily find all the tests are defined in the `bats` folder as this collection of bash scripts with reference to some common bits on the `helpers.bash`. The one that seems to be dealing with users is called `superuser.bats`. * The test basically interact with lana via `curl`-ing the graphql. It creates users and after checks that they have the right properties. * I want to dig deeper and check whether the graphql API is reading from the rolledup aggregates (I'm assuming that's the case). If so, I guess my code could make a new aggregate and I could include in my PR extending the e2e tests to test I'm rolling up things properly. + Okay, there's this file in `lana/admin-server/src/graphql/schema.rs` that has a monster object called **`mutation`****** + It uses a macro called `exec_mutation` defined in `lana/admin-server/src/graphql/macros.rs`. It seems to be a chokepoint for persistence. + The file `repo.rs` in `lana/core/access/src/user/` defines the `User` entity and its columns. Seems like a duplicate from the JSON schema I found earlier? Perhaps that JSON schema gets generated from this code? - Right, I found this recent PR in `es_entity`: https://github.com/GaloyMoney/cala/pull/444/files - It makes `es_entity` generate JSON Schema from the defined entities. + It all uses `es_entity`, Justin's own Event Sourcing framework (lol I'm so out of my league here) (https://github.com/GaloyMoney/cala/tree/17060a3157b879a40676dc381cdba30e7b5a0010/lib/es-entity). + I can't make sense of how the entity table gets filled in. * `lana/admin-server/entity-rollups/src/main.rs` is a script that updates the json schema files. + It looks for breaking changes in the schemas when it runs and cancels the writting of the schemas if a breaking change is detected. + The script can be executed by calling the `update-schemas` command defined in the Makefile. * It's funky because there is one migration file that defines all aggregate and event tables. And it definetely doesn't look auto-generated. Could there be all of these house of cards in Rust, and then you need to manually make things match on Postgres by rolling the table definition yourself? I'm going to look at the git blame of the file to check past PRs and confirm. * ChatGPT told me whenever one of the aggregates/entities gets mutated, the current code updates atomically the snapshot and the events table. So the app is kind of playing event sourcing and traditional normalized CRUD model at the same time. Now I'm kind of getting a hunch on why Justin is interested in doing the rollups in Postgres. ### Action time Okay, I've fucked around a decent bit around the repo. Time to shoot a bit or I'll go crazy. High level overview of milestones: * Build the naive rolling up of the user events + Add the target table in a migration + Add the trigger and function + Piggy ride the bats test to check that my rollup works as intended * Generate the `CREATE TABLE` statement for the rollup dynamically from the existing definitions in the repo instead of hardcoding it. * Generate the trigger function dynamically from the event definitions. * Think on how this could be generalized to other events. * Think on how to deal with the evolution of aggregates and events + New event to modify initially immutable property + New fields + Breaking changes? Issues: * No strict definition of the entity itself. Fields must be derived from events and what they perform. * I see you guys don't have a `role` column in `core_users` because the code itself doesn't do stuff A few notes during execution: * I'm assuming a user can only have one role. Would do things differently if roles where stackable. * I'm also assuming that revoking a role leaves the user with no role and also that it doesn't require specifying the role to be revoked, as I see in `lana/app/tests/user.rs/` `bank_manager_lifecycle`. + Which is kind of confirmed in the tests in `core/access/src/user/entity.rs`. + Even though the definition of the `UserEvent` enum in the same file does mention that `RoleRevoked` does require a `RoleId`. So does the `user_event_schema.json` entry for the event (which makes sense since that JSON is derived from `UserEvent` in the first place). + This seems to confirm you don't want to have to specify an ID when revoking, even though what I find in the code is sort of contradictory: https://github.com/GaloyMoney/lana-bank/pull/2094 ### Later / to-dos * It would be worth looking at the staging layer of the dbt project to understand how these event aggregates are being integrated right now into Bigquery. + Sometimes they get the events, sometimes they get the aggregate, sometimes they get both.