deployment instructions
This commit is contained in:
parent
3fa607aa4f
commit
1f1fd95634
2 changed files with 34 additions and 4 deletions
24
README.md
24
README.md
|
|
@ -33,10 +33,6 @@ Open a feature branch (`feature/your-branch-name`) for any changes and make it s
|
|||
|
||||
We organize models in four folders:
|
||||
|
||||
- `sync`
|
||||
- Dedicated to sources.
|
||||
|
||||
|
||||
- `staging`
|
||||
- Pretty much this: <https://docs.getdbt.com/best-practices/how-we-structure/2-staging>
|
||||
- One `.yml` per `sync` schema, with naming `_<sourcename>_sources.yml`. For example, for Core, `_core_sources.yml`.
|
||||
|
|
@ -60,6 +56,26 @@ We organize models in four folders:
|
|||
- Datetime columns should either finish in `_utc` or `_local`. If they finish in local, the table should contain a `local_timezone` column that contains the [timezone identifier](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
|
||||
- We work with many currencies and lack a single main once. Hence, any money fields will be ambiguous on their own. To address this, any table that has money related columns should also have a column named `currency`. We currently have no policy for tables where a single record has columns in different currencies. If you face this, assemble the data team and decide on something.
|
||||
|
||||
## How to schedule
|
||||
|
||||
We currently use a minimal setup where we run the project from a VM within our infra with a simple cron job. These instructions are fit for Azure VMs running Ubuntu 22.04, you might need to change details if you are running somewhere else.
|
||||
|
||||
To deploy:
|
||||
|
||||
- Prepare a VM with Ubuntu 22.04
|
||||
- You need to have Python `>=3.10` installed.
|
||||
- You must be able to reach the DWH server through the network.
|
||||
- On the VM, set up git creds for the project (for example, with an ssh key) and clone the git project in the `azureuser` home dir. And checkout main.
|
||||
- Create a virtual environment for the project with `python3 -m venv venv`.
|
||||
- Activate the virtual environment and run `pip install -r requirements.txt`
|
||||
- Create an entry for this project `profiles.yml` file at `~/.dbt/profiles.yml`. You have a suggested template at `profiles.yml.example`. Make sure that the `profiles.yml` host and port settings are consistent with whatever networking approach you've taken.
|
||||
- There's a script in the root of this project called `run_dbt.sh`. Place it in `~/run_dbt.sh`. Adjust the paths of the script if you want/need to.
|
||||
- Create a cron entry with `crontab -e` that runs the script. For example: `0 2 * * * /bin/bash /home/azureuser/run_dbt.sh` to run the dbt models every day at 2AM.
|
||||
|
||||
To monitor
|
||||
|
||||
- The script writes output to a `dbt_run.log` file. You can check the contents to see what happened in the past runs. The exact location of the log file depends on how you set up the `run_dbt.sh` script. If you are unsure of where your logs are being written, check the script to find out.
|
||||
|
||||
## Stuff that we haven't done but we would like to
|
||||
|
||||
- Automate formatting with git pre-commit.
|
||||
|
|
|
|||
14
run_dbt.sh
Normal file
14
run_dbt.sh
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
#!/bin/bash
|
||||
|
||||
cd /home/azureuser/data-dwh-dbt-project
|
||||
|
||||
# Update from git
|
||||
echo "Updating dbt project from git." | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1
|
||||
git pull | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1
|
||||
|
||||
# Activate venv
|
||||
source venv/bin/activate
|
||||
|
||||
# Run dbt
|
||||
echo "Triggering dbt run" | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1
|
||||
dbt run | while IFS= read -r line; do printf '%s %s\n' "$(date)" "$line"; done >> /home/azureuser/dbt_run.log 2>&1
|
||||
Loading…
Add table
Add a link
Reference in a new issue