diff --git a/README.md b/README.md index 42d6e75..068d88a 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ Welcome to Superhog's DWH dbt project. Here we model the entire DWH. ## How to set up your environment +### Basics + - Pre-requisites - You need a Linux environment. That can be Linux, macOS or WSL. - You need to have Python `>=3.10` installed. @@ -23,6 +25,24 @@ Welcome to Superhog's DWH dbt project. Here we model the entire DWH. - If you are in VSCode, you most probably want to have this extension installed: [dbt Power User](https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user) - It is advised to use [this autoformatter](https://sqlfmt.com/) and to automatically [run it on save](https://docs.sqlfmt.com/integrations/vs-code). +### Local DWH + +Having a database where you can run your WIP models is very useful to ease development. But obviously, we can't do that in production. We could do it in a shared dev instance, but then we would step into each others toes when developing. + +To overcome these issues, we rely on local clones of the DWH. The idea is to have a PostgreSQL instance running on your laptop. You perform your `dbt run` statements for testing and you validate the outcome of your work there. When you are confident and have tested properly, you can PR to master. + +You will find a docker compose file named `dev-dwh.docker-compose.yml`. It will simply start a PostgreSQL 16 database in your device. You can raise it, adjust it to your needs, and adapt the `profiles.yml` file to point to it when you are developing locally. + +The only missing bit to make your local deployment be like the production DWH is to have the source data from the source systems. The current policy is to generate a dump from the production database with what you need and restore it in your local postgres. That way, you are using accurate and representative data to do your work. + +For example, if you are working on models that use data from Core, you can dump and restore from your terminal with something roughly like this: + +```bash +pg_dump -h superhog-dwh-prd.postgres.database.azure.com -U airbyte_user -W -F t dwh -n sync_core > core.dump + +pg_restore -h localhost -U postgres -W -d dwh core.dump +``` + ## Branching strategy This repo works in a trunk-based-development philosophy (). diff --git a/dev-dwh.docker-compose.yml b/dev-dwh.docker-compose.yml new file mode 100644 index 0000000..32a493b --- /dev/null +++ b/dev-dwh.docker-compose.yml @@ -0,0 +1,14 @@ +version: '3.8' +services: + dwh-local: + image: postgres:16 + environment: + - POSTGRES_USER=postgres + - POSTGRES_PASSWORD=postgres + ports: + - '5432:5432' + volumes: + - dwh-local:/var/lib/postgresql/data +volumes: + dwh-local: + driver: local \ No newline at end of file