data-dwh-dbt-project/ci/README.md

57 lines
3.7 KiB
Markdown
Raw Normal View History

2025-03-20 12:24:32 +01:00
# CI
2025-04-02 15:00:22 +02:00
You can setup CI pipelines for the project if you want. This enables performing certain checks in PRs and master commits, which is useful to minimize errors and ensure certain quality levels are met.
The details here are specific to Azure Devops. If you need to set things up in a different Git/CI env, you'll have to adjust your way into it.
## CI VM Setup
### Requirements
These instructions assume that:
- You have a VM ready to be setup as the CI server.
- You can SSH into it.
- The VM has Docker and Docker Compose installed and ready to run.
- The VM has `psql` installed.
- The VM has the Azure CI agent installed.
- That you have cloned this repository in the home folder of the user you use in that VM.
- The DWH production instance has a CI dedicated user that can read from all sync schemas as well as `staging`, `intermediate` and `reporting`, and you have the credentials.
2025-04-02 15:54:41 +02:00
If you don't have this, it probably means you need to review our Infrastructure repository where we describe how to set a VM up with all of this.
2025-04-02 15:00:22 +02:00
### Setting things up
2025-04-02 15:54:41 +02:00
- SSH into the CI VM.
2025-04-02 15:00:22 +02:00
- Create a folder in the user home directory named `dbt-ci`.
2025-04-04 14:46:31 +02:00
- Create a copy of the `ci/ci.env` file there naming it `.env` (assuming you're in the repo root dir, `cp ci/ci.env ~/dbt-ci/.env`) and fill it with values of your choice.
2025-04-08 12:32:56 +02:00
- Copy the `docker-compose.yml` file into `dbt-ci`. Modify your copy with values for the Postgres server parameters. Which values to set depend on your hardware. If you don't want or can't decide values for these parameters, you can just comment the lines.
- Enter the `ci` folder and execute the script named `ci-vm-setup.sh` in with `.env` file you just filled in sourced (you can run this: `(set -a && source ~/dbt-ci/.env && set +a && bash ci-vm-setup.sh)`). This script will take care of most of the setup that need to be executed, including:
2025-04-02 15:00:22 +02:00
- Preparing the postgres database.
- Setting up the dockerized postgres with the right database, FDW, etc.
- Prepare the `profiles.yml` file.
2025-04-02 15:54:41 +02:00
### Testing
2025-04-02 15:00:22 +02:00
2025-04-02 15:54:41 +02:00
- If the infra was set correctly and you followed the previous steps, you should be ready to roll.
- You might want to activate pipeline executions in Devops if you had it off while preparing everything.
- Once that's done:
- Create a branch in this repository.
- Add some silly change to any dbt model.
- Open a PR in Devops from the branch.
- If everything is fine, you should see in Devops the pipeline getting triggered automatically and walking through all the steps described in `.azure-pipelines.master.yml`.
- Once you make a commit to `master` or merge PR to `master`, you should also see pipelines getting triggered automatically `.azure-pipelines.master.yml`.
2025-04-02 16:10:12 +02:00
### What the hell are these files
A small inventory of the funky files here:
- `ci-vm-setup.sh`: executes some set up steps that are needed the first time you prepare the CI VM.
- `ci.env`: template for the `.env` file that needs to be placed in the CI VM.
- `ci.profiles.yml`: template for the dbt `profiles.yml` file that needs to be placed in the CI VM.
- `ci-requirements.txt`: CI specific Python packages that need to be installed in CI runs (but not for running or developing on this project).
- `docker-compose.yml`: the docker compose file that defines the Postgres that runs in the CI VM.
- `postgres-initial-setup.sql`: a SQL file that completes set up steps required in the CI Postgres in the one-off initial setup.
- `sqlfluff-check.sh`: a script to check a folder's SQL files and validate them. Fails if any SQL is not parseable.
- `.sqlfluff`: some config for sqlfluff.
- `build-master-artifacts.sh`: a script that generates the `manifest.json` for the master branch and places it in a target folder.
- `.azure-pipelines.blablabla.yml`: the actual pipeline definitions for Azure.