add a few more conventions to readme

This commit is contained in:
Pablo Martin 2024-05-31 10:35:57 +02:00
parent ae10d30cd1
commit d81c0b5e74

View file

@ -75,12 +75,29 @@ We organize models in four folders:
## Conventions
- dbt practices:
- Always use CTEs in your models to `source` and `ref` other models.
- We follow [snake case](https://en.wikipedia.org/wiki/Snake_case).
- Columns and naming
- We follow [snake case](https://en.wikipedia.org/wiki/Snake_case) for column names and table names.
- Identifier columns should begin with `id_`, not finish with `_id`.
- Use binary question-like column names for binary, bool, and flag columns (i.e. not `active` but `is_active`, not `verified` but `has_been_verified`, not `imported` but `was_imported`)
- Datetime columns should either finish in `_utc` or `_local`. If they finish in local, the table should contain a `local_timezone` column that contains the [timezone identifier](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
- We work with many currencies and lack a single main once. Hence, any money fields will be ambiguous on their own. To address this, any table that has money related columns should also have a column named `currency`. We currently have no policy for tables where a single record has columns in different currencies. If you face this, assemble the data team and decide on something.
- Folder structures and naming
- All models live in models, and either in staging, intermediate or reporting.
- Staging models should be prepended with `stg_` and intermediate with `int_`.
- Split schema and domain with double underscode (ie `stg_core__booking`).
- Always use sources to read into staging models.
- SQL formatting should be done with `sqlfmt`.
When in doubt, do what dbt guys would do: <https://docs.getdbt.com/best-practices>
Or Gitlab: <https://handbook.gitlab.com/handbook/business-technology/data-team/platform/dbt-guide/>
## Testing Standards
- All tables in staging need Primary Key and Null tests.
- Tables in reporting should have more thorough testing. What to look for is up to you, but it should provide strong confidence in the quality of data.
- Tests will be ran after every `dbt run`.
## How to schedule