No description
Find a file
Pablo Martin 4e430a94a9 typo
2024-01-18 17:36:20 +01:00
.vscode renames 2024-01-18 17:34:39 +01:00
analyses start project 2024-01-18 11:24:35 +01:00
macros first table reading from sync_core 2024-01-18 12:20:14 +01:00
models typo 2024-01-18 17:36:20 +01:00
seeds start project 2024-01-18 11:24:35 +01:00
snapshots start project 2024-01-18 11:24:35 +01:00
tests start project 2024-01-18 11:24:35 +01:00
.gitignore start project 2024-01-18 11:24:35 +01:00
dbt_project.yml many things 2024-01-18 17:25:41 +01:00
poetry.lock start project 2024-01-18 11:24:35 +01:00
profiles.yml.example many things 2024-01-18 17:25:41 +01:00
pyproject.toml start project 2024-01-18 11:24:35 +01:00
README.md many things 2024-01-18 17:25:41 +01:00

DWH dbt

Welcome to Superhog's DWH dbt project. Here we model the entire DWH.

How to set up your environment

  • Pre-requisites
    • You need a Linux environment. That can be Linux, macOS or WSL.
    • You need to install Python >=3.10 and poetry.
    • All docs will assume you are using VSCode.
  • Prepare SSH tunnels
    • We currently use SSH tunnels to reach both the dev and prd instances. You can ask Pablo how to set these up.
    • You will need to activate the tunnels in order to run the dbt models on the databases. It will probably pay off to make them easy to activate in your terminal, you can make an alias.
  • Set up
    • Create an entry for this project profiles.yml file at ~/.dbt/profiles.yml. You have a suggested template at profiles.yml.example
    • Make sure that the profiles.yml host and port settings are consistent with the tunnels.
    • Use poetry install to get dependencies in place.
  • Check
    • Ensure you are running in the project venv, either by setting VSCode Python interpreter to the one created by poetry, or by running poetry shell in the console when in the root dir.
    • Turn on your tunnel to dev and run dbt debug. If it runs well, you are all set. If it fails, there's something wrong with your set up. Grab the terminal output and pull the thread.
  • Complements

Branching strategy

This repo works in a trunk-based-development philosophy (https://trunkbaseddevelopment.com/).

Open a feature branch (feature/your-branch-name) for any changes and make it short-lived. It's fine and encouraged to build incrementally towards a mart level table with multiple PRs as long as you keep the model buildable along the way.

Project organization

We organize models in four folders:

Conventions

  • Always use CTEs in your models to source and ref other models.
  • We follow snake case.
  • Identifier columns should begin with id_, not finish with _id.
  • Use binary question-like column names for binary, bool, and flag columns (i.e. not active but is_active, not verified but has_been_verified, not imported but was_imported)
  • Datetime columns should either finish in _utc or _local. If they finish in local, the table should contain a local_timezone column that contains the timezone identifier.
  • We work with many currencies and lack a single main once. Hence, any money fields will be ambiguous on their own. To address this, any table that has money related columns should also have a column named currency. We currently have no policy for tables where a single record has columns in different currencies. If you face this, assemble the data team and decide on something.

Stuff that we haven't done but we would like to

  • Automate formatting with git pre-commit.
  • Define conventions on testing (and enforce them).
  • Define conventions on documentation (and enforce them).
  • Replace SSH tunneling with a Wireguard VPN access.
  • Prepare a quick way to replicate parts of the prd dwh in our local machines.