sh-notion/notion_data_team_no_files/20240919-01 - dbt test failure because wrong confi 1060446ff9c98081896ad46ad0b153e7.md

66 lines
4.2 KiB
Markdown
Raw Normal View History

2025-07-11 16:15:17 +02:00
# 20240919-01 - dbt test failure because wrong configuration in schema file
# dbt test failure because wrong configuration in schema file
Managed by: Uri
## Summary
- Components involved: data-dwh-dbt-project
- Started at: 2024-09-18 12:41 CEST
- Detected at: 2024-09-19 08:43 CEST
- Mitigated at: 2024-09-19 09:01 CEST
## Summary
A buggy code was commited and merged into master on 18th of September that was unnoticed. In the scheduled production run in the morning of the 19th, dbt test failed because couldnt compile the test. The fix has been to remove the buggy configuration in the schema entry of core__bookings, merge, re-run dbt test in prod.
## Impact
Not a massive impact because it was a test failing in reporting in `core__bookings` model
## Timeline
- 2024-09-18 12:41 CEST - Faulty commit [923bfa70](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/923bfa70919bf552304150e8cc3ec9af7cdbe708?refName=refs%2Fheads%2Fmaster&path=%2Fmodels%2Freporting%2Fcore%2Fschema.yml&_a=contents) is created
- 2024-09-18 16:30 CEST - Branch containing the faulty commit in pull request [!2877](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2877?_a=files&path=/models/reporting/core/schema.yml) is merged into production
- 2024-09-19 08:43 CEST - Data team sees the alert in `#data-alerts` slack channel
- 2024-09-19 08:48 CEST - Data team accesses the production logs of dbt tests to notice the failure, specifically:
> Compilation Error in test not_nullgit_core__bookings_id_booking (models/reporting/core/schema.yml)
>
- 2024-09-19 08:51 CEST - The faulty commit [923bfa70](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/923bfa70919bf552304150e8cc3ec9af7cdbe708?refName=refs%2Fheads%2Fmaster&path=%2Fmodels%2Freporting%2Fcore%2Fschema.yml&_a=contents) is spotted. Uri proceeds to create a PR to remove the issue.
- 2024-09-19 09:00 CEST - The fix is merged in production in commit [feaedb2a](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/feaedb2a06bd37555217ee0fb645c8f5a07b070d?refName=refs%2Fheads%2Fmaster)
- 2024-09-19 09:01 CEST - Succesful launch of a re-run of the dbt tests with the fixes.
## Root Cause(s)
An involuntary human error modified a line of code in the schema entry of `core__bookings` in the test section for `id_booking`, modifying the `not_null` test to `not_nullgit p`. This change, in the commit [923bfa70](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/923bfa70919bf552304150e8cc3ec9af7cdbe708?refName=refs%2Fheads%2Fmaster&path=%2Fmodels%2Freporting%2Fcore%2Fschema.yml&_a=contents), happened after the review of the data team members on the PR [!2877](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2877?_a=files&path=/models/reporting/core/schema.yml) once it was approved, thus it went unnoticed.
![image.png](image%2061.png)
## Resolution and recovery
Fix has been straight forward: just change back the `not_nullgit p` to `not_null`. Afterwards, merge into prod and re run the dbt tests successfully.
## **Lessons Learned**
What went well:
- dbt test alerts work well and the team effectively checks the channel once an alert is raised.
What went badly
- Unproper self-review and cross-review of code before merging. Personally, I didnt check the PR since it was already approved by Pablo. At the same time, this approval came before the faulty commit. We should be all more careful/sceptical when merging into production, specially if we leave an approval in the PR.
Where did we get lucky:
- Minimal impact, it was just a single failing test in reporting schema that would have passed anyway. However, this situation could have been worse if this bug had been in place directly in a model code.
## Action Items
- Tend to review and re-review indistinctly of PRs being already approved.
- Check commits made after the approval.
- When merging into prod, run both the normal execution of dbt (`run_dbt.sh`) and the tests (`run_tests.sh`). This would have make this issue appear early
- Automate CI checks on the dbt project (try to compile the project and perhaps also run tests on every PR, block merging if it doesnt work)
##