Merged PR 3034: FDW local setup

# Description

This PR changes the documentation on how we set up our local Postgres instance to leverage Foreign Data Wrappers to have a more convenient development workflow in dbt.

@<Joaquin Ossa> and @<Oriol Roqué Paniagua> : for this review, I want to ask you to:
- Take a look at the entire `dev-env/local_dwh.md` file with a critical eye and make suggestions, spot mistakes, look for gaps, etc.
- And also follow and use (so we know the instructions are complete) the section `dev-env/local_dwh.md#Filling `dwh` with copied data`. This bit is new and we haven't discussed, but I hope the instructions are self-descriptive. Please, try to follow them and validate that you can run things as described before approving this PR.
This commit is contained in:
Pablo Martín 2024-10-04 13:35:02 +00:00
commit 167875a8e8
5 changed files with 321 additions and 21 deletions

View file

@ -31,21 +31,9 @@ Welcome to Superhog's DWH dbt project. Here we model the entire DWH.
### Local DWH
Having a database where you can run your WIP models is very useful to ease development. But obviously, we can't do that in production. We could do it in a shared dev instance, but then we would step into each others toes when developing.
Running a local version of the DWH allows you to test things as you develop: a must if you want to push changes to master without breaking everything.
To overcome these issues, we rely on local clones of the DWH. The idea is to have a PostgreSQL instance running on your laptop. You perform your `dbt run` statements for testing and you validate the outcome of your work there. When you are confident and have tested properly, you can PR to master.
You will find a docker compose file named `dev-dwh.docker-compose.yml`. It will simply start a PostgreSQL 16 database in your device. You can raise it, adjust it to your needs, and adapt the `profiles.yml` file to point to it when you are developing locally. Bear in mind the file comes with Postgres server settings which were based on the laptops being used in the team on August 2024. They might be more or less relevant to you. In case of doubt, you might want to use: https://pgtune.leopard.in.ua/.
The only missing bit to make your local deployment be like the production DWH is to have the source data from the source systems. The current policy is to generate a dump from the production database with what you need and restore it in your local postgres. That way, you are using accurate and representative data to do your work.
For example, if you are working on models that use data from Core, you can dump and restore from your terminal with something roughly like this:
```bash
pg_dump -h superhog-dwh-prd.postgres.database.azure.com -U airbyte_user -W -F t dwh -n sync_xero_superhog_limited > xero.dump
pg_restore -h localhost -U postgres -W -d dwh xero.dump
```
You can read on how to set this up in `dev-env/local_dwh.md`.
## Branching strategy