diff --git a/README.md b/README.md index 4cad5fe..6d95825 100644 --- a/README.md +++ b/README.md @@ -108,20 +108,32 @@ To deploy: - Activate the virtual environment and run `pip install -r requirements.txt` - Also run `dbt deps` to install the dbt packages required by the project. - Create an entry for this project `profiles.yml` file at `~/.dbt/profiles.yml`. You have a suggested template at `profiles.yml.example`. Make sure that the `profiles.yml` host and port settings are consistent with whatever networking approach you've taken. -- There are two scripts in the root of this project called `run_dbt.sh` and `run_tests.sh`. Place them in the running user's home folder. Adjust the paths of the script if you want/need to. +- There are three scripts in the root of this project called `run_dbt.sh`, `run_tests.sh` and `run_docs.sh`. Place them in the running user's home folder. Adjust the paths of the script if you want/need to. +- `run_dbt.sh` and `run_tests.sh` don't take any CLI arguments. `run_docs.sh` takes one: the folder where you would like the docs to be placed. So, if you want docs at `/some/path/for/docs/`, you would call the script like this: `/bin/bash run_docs.sh /some/path/for/docs/`. - The scripts are designed to send both success and failure messages to slack channels upon completion. To properly set this up, you will need to place a file called `slack_webhook_urls.txt` on the same path you put the script files. The slack webhooks file should have two lines: `SLACK_ALERT_WEBHOOK_URL=` and `SLACK_RECEIPT_WEBHOOK_URL=`. Setting up the slack channels and webhooks is outside of the scope of this readme. -- Create a cron entry with `crontab -e` that runs the scripts. For example: `0 2 * * * /bin/bash /home/azureuser/run_dbt.sh` to run the dbt models every day at 2AM, and `15 2 * * * /bin/bash /home/azureuser/run_tests.sh` to run the tests fifteen minutes later. +- Create a cron entry with `crontab -e` that runs the scripts. For example, you can use the following line to sequentially build the documentation, run the models and then test the DWH, making each step only happen if the previous one succeds: + + ```bash + # This goes in your crontab file + 15 6 * * * /bin/bash /home/azureuser/run_docs.sh /home/azureuser/dbtdocs && /bin/bash /home/azureuser/run_dbt.sh && /bin/bash /home/azureuser/run_tests.sh + ``` To monitor: - The model building script writes output to a `dbt_run.log` file. You can check the contents to see what happened in the past runs. The exact location of the log file depends on how you set up the `run_dbt.sh` script. If you are unsure of where your logs are being written, check the script to find out. - Same applies to the test script, except it will write into a separate `dbt_test.log`. +- And the docs script will write in `dbt_docs.log`. To maintain: - Remember to update dbt package dependencies when including new packages. +## Serving the docs with a web server + +Once you build the docs with `run_docs.sh`, you will have a bunch of files. To open them up, you will need to serve them with a webserver like Caddy or Nginx. + +This goes beyond the scope of this project: to understand how you can serve these, refer to our [infra script repo](https://guardhog.visualstudio.com/Data/_git/data-infra-script). Specifically, the bits around the web gateway set up. + ## Stuff that we haven't done but we would like to - Automate formatting with git pre-commit. -- Prepare a quick way to replicate parts of the `prd` dwh in our local machines.