instructions for local dwh

This commit is contained in:
Pablo Martin 2024-02-15 15:36:04 +01:00
parent 7e60c6e6e3
commit ab5902ff1f
2 changed files with 34 additions and 0 deletions

View file

@ -4,6 +4,8 @@ Welcome to Superhog's DWH dbt project. Here we model the entire DWH.
## How to set up your environment
### Basics
- Pre-requisites
- You need a Linux environment. That can be Linux, macOS or WSL.
- You need to have Python `>=3.10` installed.
@ -23,6 +25,24 @@ Welcome to Superhog's DWH dbt project. Here we model the entire DWH.
- If you are in VSCode, you most probably want to have this extension installed: [dbt Power User](https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user)
- It is advised to use [this autoformatter](https://sqlfmt.com/) and to automatically [run it on save](https://docs.sqlfmt.com/integrations/vs-code).
### Local DWH
Having a database where you can run your WIP models is very useful to ease development. But obviously, we can't do that in production. We could do it in a shared dev instance, but then we would step into each others toes when developing.
To overcome these issues, we rely on local clones of the DWH. The idea is to have a PostgreSQL instance running on your laptop. You perform your `dbt run` statements for testing and you validate the outcome of your work there. When you are confident and have tested properly, you can PR to master.
You will find a docker compose file named `dev-dwh.docker-compose.yml`. It will simply start a PostgreSQL 16 database in your device. You can raise it, adjust it to your needs, and adapt the `profiles.yml` file to point to it when you are developing locally.
The only missing bit to make your local deployment be like the production DWH is to have the source data from the source systems. The current policy is to generate a dump from the production database with what you need and restore it in your local postgres. That way, you are using accurate and representative data to do your work.
For example, if you are working on models that use data from Core, you can dump and restore from your terminal with something roughly like this:
```bash
pg_dump -h superhog-dwh-prd.postgres.database.azure.com -U airbyte_user -W -F t dwh -n sync_core > core.dump
pg_restore -h localhost -U postgres -W -d dwh core.dump
```
## Branching strategy
This repo works in a trunk-based-development philosophy (<https://trunkbaseddevelopment.com/>).