From 1f1fd956341e8645cbcc301f4d2025bd8bc5ad83 Mon Sep 17 00:00:00 2001 From: Pablo Martin Date: Tue, 6 Feb 2024 12:09:47 +0100 Subject: [PATCH] deployment instructions --- README.md | 24 ++++++++++++++++++++---- run_dbt.sh | 14 ++++++++++++++ 2 files changed, 34 insertions(+), 4 deletions(-) create mode 100644 run_dbt.sh diff --git a/README.md b/README.md index bf58b5e..42d6e75 100644 --- a/README.md +++ b/README.md @@ -33,10 +33,6 @@ Open a feature branch (`feature/your-branch-name`) for any changes and make it s We organize models in four folders: -- `sync` - - Dedicated to sources. - - - `staging` - Pretty much this: - One `.yml` per `sync` schema, with naming `__sources.yml`. For example, for Core, `_core_sources.yml`. @@ -60,6 +56,26 @@ We organize models in four folders: - Datetime columns should either finish in `_utc` or `_local`. If they finish in local, the table should contain a `local_timezone` column that contains the [timezone identifier](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). - We work with many currencies and lack a single main once. Hence, any money fields will be ambiguous on their own. To address this, any table that has money related columns should also have a column named `currency`. We currently have no policy for tables where a single record has columns in different currencies. If you face this, assemble the data team and decide on something. +## How to schedule + +We currently use a minimal setup where we run the project from a VM within our infra with a simple cron job. These instructions are fit for Azure VMs running Ubuntu 22.04, you might need to change details if you are running somewhere else. + +To deploy: + +- Prepare a VM with Ubuntu 22.04 +- You need to have Python `>=3.10` installed. +- You must be able to reach the DWH server through the network. +- On the VM, set up git creds for the project (for example, with an ssh key) and clone the git project in the `azureuser` home dir. And checkout main. +- Create a virtual environment for the project with `python3 -m venv venv`. +- Activate the virtual environment and run `pip install -r requirements.txt` +- Create an entry for this project `profiles.yml` file at `~/.dbt/profiles.yml`. You have a suggested template at `profiles.yml.example`. Make sure that the `profiles.yml` host and port settings are consistent with whatever networking approach you've taken. +- There's a script in the root of this project called `run_dbt.sh`. Place it in `~/run_dbt.sh`. Adjust the paths of the script if you want/need to. +- Create a cron entry with `crontab -e` that runs the script. For example: `0 2 * * * /bin/bash /home/azureuser/run_dbt.sh` to run the dbt models every day at 2AM. + +To monitor + +- The script writes output to a `dbt_run.log` file. You can check the contents to see what happened in the past runs. The exact location of the log file depends on how you set up the `run_dbt.sh` script. If you are unsure of where your logs are being written, check the script to find out. + ## Stuff that we haven't done but we would like to - Automate formatting with git pre-commit. diff --git a/run_dbt.sh b/run_dbt.sh new file mode 100644 index 0000000..543e533 --- /dev/null +++ b/run_dbt.sh @@ -0,0 +1,14 @@ +#!/bin/bash + +cd /home/azureuser/data-dwh-dbt-project + +# Update from git +echo "Updating dbt project from git." | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1 +git pull | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1 + +# Activate venv +source venv/bin/activate + +# Run dbt +echo "Triggering dbt run" | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1 +dbt run | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1 \ No newline at end of file