deployment instructions

This commit is contained in:
Pablo Martin 2024-02-06 12:09:47 +01:00
parent 3fa607aa4f
commit 1f1fd95634
2 changed files with 34 additions and 4 deletions

View file

@ -33,10 +33,6 @@ Open a feature branch (`feature/your-branch-name`) for any changes and make it s
We organize models in four folders: We organize models in four folders:
- `sync`
- Dedicated to sources.
- `staging` - `staging`
- Pretty much this: <https://docs.getdbt.com/best-practices/how-we-structure/2-staging> - Pretty much this: <https://docs.getdbt.com/best-practices/how-we-structure/2-staging>
- One `.yml` per `sync` schema, with naming `_<sourcename>_sources.yml`. For example, for Core, `_core_sources.yml`. - One `.yml` per `sync` schema, with naming `_<sourcename>_sources.yml`. For example, for Core, `_core_sources.yml`.
@ -60,6 +56,26 @@ We organize models in four folders:
- Datetime columns should either finish in `_utc` or `_local`. If they finish in local, the table should contain a `local_timezone` column that contains the [timezone identifier](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). - Datetime columns should either finish in `_utc` or `_local`. If they finish in local, the table should contain a `local_timezone` column that contains the [timezone identifier](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
- We work with many currencies and lack a single main once. Hence, any money fields will be ambiguous on their own. To address this, any table that has money related columns should also have a column named `currency`. We currently have no policy for tables where a single record has columns in different currencies. If you face this, assemble the data team and decide on something. - We work with many currencies and lack a single main once. Hence, any money fields will be ambiguous on their own. To address this, any table that has money related columns should also have a column named `currency`. We currently have no policy for tables where a single record has columns in different currencies. If you face this, assemble the data team and decide on something.
## How to schedule
We currently use a minimal setup where we run the project from a VM within our infra with a simple cron job. These instructions are fit for Azure VMs running Ubuntu 22.04, you might need to change details if you are running somewhere else.
To deploy:
- Prepare a VM with Ubuntu 22.04
- You need to have Python `>=3.10` installed.
- You must be able to reach the DWH server through the network.
- On the VM, set up git creds for the project (for example, with an ssh key) and clone the git project in the `azureuser` home dir. And checkout main.
- Create a virtual environment for the project with `python3 -m venv venv`.
- Activate the virtual environment and run `pip install -r requirements.txt`
- Create an entry for this project `profiles.yml` file at `~/.dbt/profiles.yml`. You have a suggested template at `profiles.yml.example`. Make sure that the `profiles.yml` host and port settings are consistent with whatever networking approach you've taken.
- There's a script in the root of this project called `run_dbt.sh`. Place it in `~/run_dbt.sh`. Adjust the paths of the script if you want/need to.
- Create a cron entry with `crontab -e` that runs the script. For example: `0 2 * * * /bin/bash /home/azureuser/run_dbt.sh` to run the dbt models every day at 2AM.
To monitor
- The script writes output to a `dbt_run.log` file. You can check the contents to see what happened in the past runs. The exact location of the log file depends on how you set up the `run_dbt.sh` script. If you are unsure of where your logs are being written, check the script to find out.
## Stuff that we haven't done but we would like to ## Stuff that we haven't done but we would like to
- Automate formatting with git pre-commit. - Automate formatting with git pre-commit.

14
run_dbt.sh Normal file
View file

@ -0,0 +1,14 @@
#!/bin/bash
cd /home/azureuser/data-dwh-dbt-project
# Update from git
echo "Updating dbt project from git." | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1
git pull | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1
# Activate venv
source venv/bin/activate
# Run dbt
echo "Triggering dbt run" | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1
dbt run | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1