Merged PR 3630: Build docs

# Description

This PR adds a script to build the project documentation and inclues some notes on how to call it and serve the docs.

# Checklist

NA

# Other
NA

Related work items: #23999
This commit is contained in:
Pablo Martín 2024-11-26 09:58:12 +00:00
commit ab0ac1c47e
2 changed files with 78 additions and 3 deletions

View file

@ -108,20 +108,32 @@ To deploy:
- Activate the virtual environment and run `pip install -r requirements.txt`
- Also run `dbt deps` to install the dbt packages required by the project.
- Create an entry for this project `profiles.yml` file at `~/.dbt/profiles.yml`. You have a suggested template at `profiles.yml.example`. Make sure that the `profiles.yml` host and port settings are consistent with whatever networking approach you've taken.
- There are two scripts in the root of this project called `run_dbt.sh` and `run_tests.sh`. Place them in the running user's home folder. Adjust the paths of the script if you want/need to.
- There are three scripts in the root of this project called `run_dbt.sh`, `run_tests.sh` and `run_docs.sh`. Place them in the running user's home folder. Adjust the paths of the script if you want/need to.
- `run_dbt.sh` and `run_tests.sh` don't take any CLI arguments. `run_docs.sh` takes one: the folder where you would like the docs to be placed. So, if you want docs at `/some/path/for/docs/`, you would call the script like this: `/bin/bash run_docs.sh /some/path/for/docs/`.
- The scripts are designed to send both success and failure messages to slack channels upon completion. To properly set this up, you will need to place a file called `slack_webhook_urls.txt` on the same path you put the script files. The slack webhooks file should have two lines: `SLACK_ALERT_WEBHOOK_URL=<url-of-webhook-for-failures>` and `SLACK_RECEIPT_WEBHOOK_URL=<url-of-webhook-for-successful-runs>`. Setting up the slack channels and webhooks is outside of the scope of this readme.
- Create a cron entry with `crontab -e` that runs the scripts. For example: `0 2 * * * /bin/bash /home/azureuser/run_dbt.sh` to run the dbt models every day at 2AM, and `15 2 * * * /bin/bash /home/azureuser/run_tests.sh` to run the tests fifteen minutes later.
- Create a cron entry with `crontab -e` that runs the scripts. For example, you can use the following line to sequentially build the documentation, run the models and then test the DWH, making each step only happen if the previous one succeds:
```bash
# This goes in your crontab file
15 6 * * * /bin/bash /home/azureuser/run_docs.sh /home/azureuser/dbtdocs && /bin/bash /home/azureuser/run_dbt.sh && /bin/bash /home/azureuser/run_tests.sh
```
To monitor:
- The model building script writes output to a `dbt_run.log` file. You can check the contents to see what happened in the past runs. The exact location of the log file depends on how you set up the `run_dbt.sh` script. If you are unsure of where your logs are being written, check the script to find out.
- Same applies to the test script, except it will write into a separate `dbt_test.log`.
- And the docs script will write in `dbt_docs.log`.
To maintain:
- Remember to update dbt package dependencies when including new packages.
## Serving the docs with a web server
Once you build the docs with `run_docs.sh`, you will have a bunch of files. To open them up, you will need to serve them with a webserver like Caddy or Nginx.
This goes beyond the scope of this project: to understand how you can serve these, refer to our [infra script repo](https://guardhog.visualstudio.com/Data/_git/data-infra-script). Specifically, the bits around the web gateway set up.
## Stuff that we haven't done but we would like to
- Automate formatting with git pre-commit.
- Prepare a quick way to replicate parts of the `prd` dwh in our local machines.

63
run_docs.sh Normal file
View file

@ -0,0 +1,63 @@
#!/bin/bash
exec >> /home/azureuser/dbt_docs.log 2>&1
# Define the Slack webhook URL
script_dir=$(dirname "$0")
webhooks_file="slack_webhook_urls.txt"
env_file="$script_dir/$webhooks_file"
if [ -f "$env_file" ]; then
export $(grep -v '^#' "$env_file" | xargs)
else
echo "Error: $webhooks_file file not found in the script directory."
exit 1
fi
# Messages to be sent to Slack
slack_failure_message=":rotating_light::rotating_light::rotating_light: One or more failures in dbt docs build in production. :rotating_light::rotating_light::rotating_light:"
slack_success_message=":white_check_mark::white_check_mark::white_check_mark: dbt docs built successfully in production. :white_check_mark::white_check_mark::white_check_mark:"
# Initialize the failure flag
has_any_step_failed=0
cd /home/azureuser/data-dwh-dbt-project
# Update from git
echo "Updating dbt project from git."
git checkout master
git pull
# Activate venv
source venv/bin/activate
# Run tests
echo "Triggering dbt docs generate"
dbt docs generate
if [ $? -ne 0 ]; then
has_any_step_failed=1
fi
# Read the first argument as the target directory
docs_final_target_directory=$1
# Check if the target directory is provided
if [ -z "$docs_final_target_directory" ]; then
echo "Error: No target directory provided."
exit 1
fi
# Copy the generated docs to the target directory
echo "Copying generated docs to $docs_final_target_directory."
cp -r target/* "$docs_final_target_directory"
if [ $? -ne 0 ]; then
has_any_step_failed=1
fi
# Check if any step failed and send a Slack message
if [ $has_any_step_failed -eq 1 ]; then
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_failure_message\"}" $SLACK_ALERT_WEBHOOK_URL
fi
if [ $has_any_step_failed -eq 0 ]; then
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_success_message\"}" $SLACK_RECEIPT_WEBHOOK_URL
fi