No description
| .vscode | ||
| analyses | ||
| macros | ||
| models | ||
| seeds | ||
| snapshots | ||
| tests | ||
| .gitignore | ||
| dbt_project.yml | ||
| poetry.lock | ||
| profiles.yml.example | ||
| pyproject.toml | ||
| README.md | ||
DWH dbt
Welcome to Superhog's DWH dbt project. Here we model the entire DWH.
How to set up your environment
- Pre-requisites
- You need a Linux environment. That can be Linux, macOS or WSL.
- You need to install Python
>=3.10andpoetry. - All docs will assume you are using VSCode.
- Prepare SSH tunnels
- We currently use SSH tunnels to reach both the
devandprdinstances. You can ask Pablo how to set these up. - You will need to activate the tunnels in order to run the dbt models on the databases. It will probably pay off to make them easy to activate in your terminal, you can make an alias.
- We currently use SSH tunnels to reach both the
- Set up
- Create an entry for this project
profiles.ymlfile at~/.dbt/profiles.yml. You have a suggested template atprofiles.yml.example - Make sure that the
profiles.ymlhost and port settings are consistent with the tunnels. - Use
poetry installto get dependencies in place.
- Create an entry for this project
- Check
- Ensure you are running in the project venv, either by setting VSCode Python interpreter to the one created by
poetry, or by runningpoetry shellin the console when in the root dir. - Turn on your tunnel to
devand rundbt debug. If it runs well, you are all set. If it fails, there's something wrong with your set up. Grab the terminal output and pull the thread.
- Ensure you are running in the project venv, either by setting VSCode Python interpreter to the one created by
- Complements
- If you are in VSCode, you most probably want to have this extension installed: dbt Power User
- It is advised to use this autoformatter and to automatically run it on save.
Branching strategy
This repo works in a trunk-based-development philosophy (https://trunkbaseddevelopment.com/).
Open a feature branch (feature/your-branch-name) for any changes and make it short-lived. It's fine and encouraged to build incrementally towards a mart level table with multiple PRs as long as you keep the model buildable along the way.
Project organization
We organize models in four folders:
sync- Dedicated to sources.
- One
.ymlpersyncschema. - No SQL models go here.
staging- Pretty much this: https://docs.getdbt.com/best-practices/how-we-structure/2-staging
- All models go prefixed with
stg_. - Avoid
SELECT *. We don't know what dirty stuff can come from thesyncschemas.
intermediate- Pretty much this: https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate
- It's strictly forbidden to use tables here to end users.
- Make an effort to practice DRY.
reporting- Pretty much this: https://docs.getdbt.com/best-practices/how-we-structure/4-marts
- For now, we follow a monolithic approach and just have one
reportingschema. When this becomes insufficient, we will judge splitting into several schemas. - Make an effort to keep this layer stable like you would do with a library's API so that downstream dependencies don't break without control.
Conventions
- Always use CTEs in your models to
sourceandrefother models. - We follow snake case.
- Identifier columns should begin with
id_, not finish with_id. - Use binary question-like column names for binary, bool, and flag columns (i.e. not
activebutis_active, notverifiedbuthas_been_verified, notimportedbutwas_imported) - Datetime columns should either finish in
_utcor_local. If they finish in local, the table should contain alocal_timezonecolumn that contains the timezone identifier. - We work with many currencies and lack a single main once. Hence, any money fields will be ambiguous on their own. To address this, any table that has money related columns should also have a column named
currency. We currently have no policy for tables where a single record has columns in different currencies. If you face this, assemble the data team and decide on something.
Stuff that we haven't done but we would like to
- Automate formatting with git pre-commit.
- Define conventions on testing (and enforce them).
- Define conventions on documentation (and enforce them).
- Replace SSH tunneling with a Wireguard VPN access.
- Prepare a quick way to replicate parts of the
prddwh in our local machines.