Compare commits
11 commits
8bc525e4c2
...
bc3a364891
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
bc3a364891 | ||
|
|
ddc0a6a3f4 | ||
|
|
2f14b3305c | ||
|
|
a1b67d20f1 | ||
|
|
7488400cbb | ||
|
|
717590513f | ||
|
|
a1429ccec8 | ||
|
|
900c73b076 | ||
|
|
ad67a79a24 | ||
|
|
e0e97709c0 | ||
|
|
b9fe9a0552 |
19 changed files with 2040 additions and 138 deletions
31
README.md
31
README.md
|
|
@ -134,6 +134,37 @@ Once you build the docs with `run_docs.sh`, you will have a bunch of files. To o
|
||||||
|
|
||||||
This goes beyond the scope of this project: to understand how you can serve these, refer to our [infra script repo](https://guardhog.visualstudio.com/Data/_git/data-infra-script). Specifically, the bits around the web gateway set up.
|
This goes beyond the scope of this project: to understand how you can serve these, refer to our [infra script repo](https://guardhog.visualstudio.com/Data/_git/data-infra-script). Specifically, the bits around the web gateway set up.
|
||||||
|
|
||||||
|
## Detecting (and dropping) orphan models in the DWH
|
||||||
|
|
||||||
|
If you remove a model from the dbt project, but that model had already been materialized as a table or view in the DWH, the DWH object won't go on its own. You'll have to explictly drop it.
|
||||||
|
|
||||||
|
In order to make your life easier, we have a utility script in this repo for this purpose: `find_orphan_models_in_db.sh`.
|
||||||
|
|
||||||
|
You can use this script to detect and identify any orphan models. The script can be used one off or be scheduled with slack messaging, so you get automated alerts any time an orphan model appears.
|
||||||
|
|
||||||
|
The script is designed to be called from the same machine where you are executing the regular `dbt run` calls. You can try to use it in your local machine, but there are multiple gotchas which might lead to confusion.
|
||||||
|
|
||||||
|
To use it:
|
||||||
|
- *Note that this assumes you've set up the project in the VM as described in previous sections. If you deviate in naming, paths, etc, you'll probably have to adjust some references here.*
|
||||||
|
- In the VM, copy it from the project repo into the home folder: `cp find_orphan_models_in_db.sh ~/find_orphan_models_in_db.sh` and make it executable: `chmod 700 ~/find_orphan_models_in_db.sh`.
|
||||||
|
- The script takes two positional arguments: a comma separated list of schemas to review, and a path to dbt's `manifest.json`.
|
||||||
|
- Typically, if you call from the VM, you would do: `./find_orphan_models_in_db.sh staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json`.
|
||||||
|
- There is an optional `--slack` flag that will send success/failure messages to slack channels. The necessary configuration is the same described in the "How to schedule" section, so if you've already set up the dbt run, test and docs commands, you don't need to take any other steps to start sending slack messages.
|
||||||
|
- Example usage: `./find_orphan_models_in_db.sh --slack staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json `.
|
||||||
|
|
||||||
|
|
||||||
|
How to schedule:
|
||||||
|
- Simply add a cronjob in the VM with the command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
COMMAND="0 9 * * * /bin/bash /home/azureuser/find_orphan_models_in_db.sh --slack staging,intermediate,reporting /home/azureuser/data-dwh-dbt-project/target/manifest.json"
|
||||||
|
(crontab -u $USER -l; echo "$COMMAND" ) | crontab -u $USER -
|
||||||
|
```
|
||||||
|
|
||||||
|
Note some caveats:
|
||||||
|
- `sync` models are not checked.
|
||||||
|
- If for any reason, you add tables or views that are unrelated to the dbt project in the monitored schemas, these will be identified as orphan by this script. Be careful, you might drop them accidentally if you don't pay attention. The simple solution to this is... don't use dbt schemas for non-dbt purposes.
|
||||||
|
|
||||||
## CI
|
## CI
|
||||||
|
|
||||||
CI can be setup to review PRs and make the developer experience more solid and less error prone.
|
CI can be setup to review PRs and make the developer experience more solid and less error prone.
|
||||||
|
|
|
||||||
146
find_orphan_models_in_db.sh
Normal file
146
find_orphan_models_in_db.sh
Normal file
|
|
@ -0,0 +1,146 @@
|
||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
STARTING_DIR="/home/azureuser"
|
||||||
|
cd "$STARTING_DIR"
|
||||||
|
|
||||||
|
# === CONFIGURATION ===
|
||||||
|
DBT_PROJECT="dwh_dbt"
|
||||||
|
DBT_TARGET="prd"
|
||||||
|
PROFILE_YML="$STARTING_DIR/.dbt/profiles.yml"
|
||||||
|
|
||||||
|
# === Flag defaults ===
|
||||||
|
SEND_SLACK=false
|
||||||
|
|
||||||
|
# === Parse flags ===
|
||||||
|
while [[ $# -gt 0 ]]; do
|
||||||
|
case "$1" in
|
||||||
|
-s|--slack)
|
||||||
|
SEND_SLACK=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
-*)
|
||||||
|
echo "❌ Unknown option: $1"
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
break
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
# === Positional arguments ===
|
||||||
|
SCHEMAS="$1"
|
||||||
|
MANIFEST_PATH="$2"
|
||||||
|
shift 2
|
||||||
|
IFS=',' read -r -a SCHEMA_ARRAY <<< "$SCHEMAS"
|
||||||
|
|
||||||
|
# === Tool check/install ===
|
||||||
|
install_tool_if_missing() {
|
||||||
|
TOOL_CALL_NAME=$1
|
||||||
|
TOOL_APT_NAME=$2
|
||||||
|
if ! command -v "$TOOL_CALL_NAME" &>/dev/null; then
|
||||||
|
echo "🔧 Installing missing tool: $TOOL_APT_NAME"
|
||||||
|
sudo apt-get update -qq
|
||||||
|
sudo apt-get install -y "$TOOL_APT_NAME"
|
||||||
|
else
|
||||||
|
echo "✅ $TOOL_APT_NAME is installed"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
install_tool_if_missing jq jq
|
||||||
|
install_tool_if_missing yq yq
|
||||||
|
install_tool_if_missing psql postgresql-client
|
||||||
|
|
||||||
|
# === Slack webhook setup ===
|
||||||
|
script_dir=$(dirname "$0")
|
||||||
|
webhooks_file="slack_webhook_urls.txt"
|
||||||
|
env_file="$script_dir/$webhooks_file"
|
||||||
|
|
||||||
|
if [ -f "$env_file" ]; then
|
||||||
|
export $(grep -v '^#' "$env_file" | xargs)
|
||||||
|
else
|
||||||
|
echo "Error: $webhooks_file file not found in the script directory."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === Load DB credentials from profiles.yml ===
|
||||||
|
echo "🔐 Loading DB credentials from $PROFILE_YML..."
|
||||||
|
DB_NAME=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.dbname" "$PROFILE_YML")
|
||||||
|
DB_USER=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.user" "$PROFILE_YML")
|
||||||
|
DB_HOST=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.host" "$PROFILE_YML")
|
||||||
|
DB_PORT=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.port" "$PROFILE_YML")
|
||||||
|
export PGPASSWORD=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.pass" "$PROFILE_YML")
|
||||||
|
|
||||||
|
# === Get list of tables/views from Postgres ===
|
||||||
|
echo "🗃️ Reading current tables/views from PostgreSQL..."
|
||||||
|
|
||||||
|
POSTGRES_OBJECTS=()
|
||||||
|
for SCHEMA in "${SCHEMA_ARRAY[@]}"; do
|
||||||
|
echo "🔎 Scanning schema: $SCHEMA"
|
||||||
|
TABLES=$(psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -Atc "
|
||||||
|
SELECT LOWER(table_schema || '.' || table_name)
|
||||||
|
FROM information_schema.tables
|
||||||
|
WHERE table_schema = '$SCHEMA'
|
||||||
|
AND table_type IN ('BASE TABLE', 'VIEW')
|
||||||
|
AND table_name NOT LIKE 'pg_%'
|
||||||
|
ORDER BY table_schema, table_name;
|
||||||
|
")
|
||||||
|
while IFS= read -r tbl; do
|
||||||
|
tbl_cleaned=$(echo "$tbl" | tr -d '[:space:]')
|
||||||
|
[[ -n "$tbl_cleaned" ]] && POSTGRES_OBJECTS+=("$tbl_cleaned")
|
||||||
|
done <<< "$TABLES"
|
||||||
|
done
|
||||||
|
|
||||||
|
POSTGRES_OBJECTS=($(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort -u))
|
||||||
|
|
||||||
|
# === Parse manifest.json for dbt model output names ===
|
||||||
|
echo "📦 Extracting model output names from dbt manifest..."
|
||||||
|
|
||||||
|
DBT_OBJECTS=()
|
||||||
|
DBT_ENTRIES=$(jq -r '
|
||||||
|
.nodes | to_entries[] |
|
||||||
|
select(.value.resource_type == "model" or .value.resource_type == "seed") |
|
||||||
|
.value.schema + "." + .value.alias
|
||||||
|
' "$MANIFEST_PATH")
|
||||||
|
|
||||||
|
while IFS= read -r entry; do
|
||||||
|
entry_cleaned=$(echo "$entry" | tr -d '[:space:]' | tr '[:upper:]' '[:lower:]')
|
||||||
|
[[ -n "$entry_cleaned" ]] && DBT_OBJECTS+=("$entry_cleaned")
|
||||||
|
done <<< "$DBT_ENTRIES"
|
||||||
|
|
||||||
|
DBT_OBJECTS=($(printf "%s\n" "${DBT_OBJECTS[@]}" | sort -u))
|
||||||
|
|
||||||
|
# === Compare ===
|
||||||
|
echo "📊 Comparing DBT models vs Postgres state..."
|
||||||
|
|
||||||
|
RELEVANT_MODELS=($(comm -12 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
|
||||||
|
STALE_MODELS=($(comm -23 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
|
||||||
|
|
||||||
|
# === Output ===
|
||||||
|
echo ""
|
||||||
|
echo "✅ Relevant models (in both DB and DBT):"
|
||||||
|
printf "%s\n" "${RELEVANT_MODELS[@]}" | sort
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "⚠️ Stale models (in DB but NOT in DBT):"
|
||||||
|
printf "%s\n" "${STALE_MODELS[@]}" | sort
|
||||||
|
|
||||||
|
# === Format stale models for Slack ===
|
||||||
|
if [ "$SEND_SLACK" = true ]; then
|
||||||
|
echo "✅ Sending slack message with results."
|
||||||
|
if [ ${#STALE_MODELS[@]} -eq 0 ]; then
|
||||||
|
SLACK_MSG=":white_check_mark::white_check_mark::white_check_mark: dbt models reviewed. No stale models found in the database! :white_check_mark::white_check_mark::white_check_mark:"
|
||||||
|
curl -X POST -H 'Content-type: application/json' \
|
||||||
|
--data "{\"text\":\"$SLACK_MSG\"}" \
|
||||||
|
"$SLACK_RECEIPT_WEBHOOK_URL"
|
||||||
|
else
|
||||||
|
SLACK_MSG=":rotating_light::rotating_light::rotating_light: Stale models detected in Postgres (not in dbt manifest): :rotating_light::rotating_light::rotating_light:\n"
|
||||||
|
for model in "${STALE_MODELS[@]}"; do
|
||||||
|
SLACK_MSG+="- \`$model\`\n"
|
||||||
|
done
|
||||||
|
curl -X POST -H 'Content-type: application/json' \
|
||||||
|
--data "{\"text\":\"$SLACK_MSG\"}" \
|
||||||
|
"$SLACK_ALERT_WEBHOOK_URL"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
@ -0,0 +1,101 @@
|
||||||
|
/*
|
||||||
|
Dear DWH modeller.
|
||||||
|
Be aware that this model is heavily opinionated due to many data quality issues, affecting both
|
||||||
|
Athena Verifications and Guesty Claims.
|
||||||
|
|
||||||
|
We will consider a User to be a property manager email.
|
||||||
|
If a Booking is duplicated at PM email, then we will dedup it.
|
||||||
|
If a Booking is duplicated among several PM emails, then it will be considered as different Bookings.
|
||||||
|
If a Booking has several Claims, all of them will be considered, and the claim amount will be aggregated.
|
||||||
|
|
||||||
|
Keep in mind that the model uses a snapshot of Guesty Resolutions from 1st of July 2025.
|
||||||
|
This also means that the conditions for the User to be considered a high-risk client are hardcoded.
|
||||||
|
|
||||||
|
*/
|
||||||
|
with
|
||||||
|
stg_athena__verifications as (
|
||||||
|
select
|
||||||
|
-- Be aware that the same id booking can happen for more than one PM...
|
||||||
|
property_manager_email,
|
||||||
|
id_booking,
|
||||||
|
-- In case of booking duplicates per PM email, just retrieve the first
|
||||||
|
-- creation
|
||||||
|
min(created_date_utc) as created_date_utc
|
||||||
|
from {{ ref("stg_athena__verifications") }}
|
||||||
|
where id_booking is not null
|
||||||
|
group by 1, 2
|
||||||
|
),
|
||||||
|
stg_seed__guesty_resolutions as (
|
||||||
|
select
|
||||||
|
id_booking,
|
||||||
|
to_date(claim_date, 'DD/MM/YYYY') as claim_date,
|
||||||
|
case
|
||||||
|
when claim_amount ~ '^[0-9]+(\.[0-9]+)?$'
|
||||||
|
then cast(claim_amount as decimal)
|
||||||
|
else null
|
||||||
|
end as claim_amount,
|
||||||
|
claim_currency
|
||||||
|
from {{ ref("stg_seed__guesty_resolutions_snapshot_20250701") }}
|
||||||
|
),
|
||||||
|
int_daily_currency_exchange_rates as (
|
||||||
|
select * from {{ ref("int_daily_currency_exchange_rates") }}
|
||||||
|
),
|
||||||
|
users_3_months_activity as (
|
||||||
|
-- 1. The User has been using the agreed services for at least (3) months
|
||||||
|
-- (considered as 1st of July 2025)
|
||||||
|
select
|
||||||
|
property_manager_email,
|
||||||
|
min(created_date_utc) as first_verification_created_per_pm,
|
||||||
|
count(distinct id_booking) as total_count_of_bookings_per_pm
|
||||||
|
from stg_athena__verifications sav
|
||||||
|
group by 1
|
||||||
|
),
|
||||||
|
users_with_claims as (
|
||||||
|
select
|
||||||
|
u.property_manager_email,
|
||||||
|
u.first_verification_created_per_pm,
|
||||||
|
u.total_count_of_bookings_per_pm,
|
||||||
|
count(r.id_booking) as count_of_claims,
|
||||||
|
round(sum(r.claim_amount * er.rate), 0) as total_claim_amount_in_gbp,
|
||||||
|
1.0 * count(r.id_booking) / u.total_count_of_bookings_per_pm as claim_rate
|
||||||
|
from users_3_months_activity u
|
||||||
|
inner join
|
||||||
|
stg_athena__verifications v
|
||||||
|
on u.property_manager_email = v.property_manager_email
|
||||||
|
left join stg_seed__guesty_resolutions r on v.id_booking = r.id_booking
|
||||||
|
left join
|
||||||
|
int_daily_currency_exchange_rates er
|
||||||
|
on r.claim_currency = er.from_currency
|
||||||
|
and er.to_currency = 'GBP'
|
||||||
|
and r.claim_date = er.rate_date_utc
|
||||||
|
group by 1, 2, 3
|
||||||
|
),
|
||||||
|
rule_logic as (
|
||||||
|
select
|
||||||
|
*,
|
||||||
|
case
|
||||||
|
when first_verification_created_per_pm < '2025-04-01'
|
||||||
|
then true
|
||||||
|
else false
|
||||||
|
end as has_been_using_services_for_at_least_3_months,
|
||||||
|
case
|
||||||
|
when total_claim_amount_in_gbp > 2300 then true else false
|
||||||
|
end as exceeds_claim_amount_in_gbp,
|
||||||
|
case
|
||||||
|
when count_of_claims >= 5 then true else false
|
||||||
|
end as exceeds_claim_count,
|
||||||
|
case when claim_rate >= 0.07 then true else false end as exceeds_claim_rate
|
||||||
|
from users_with_claims
|
||||||
|
)
|
||||||
|
select
|
||||||
|
*,
|
||||||
|
case
|
||||||
|
when
|
||||||
|
has_been_using_services_for_at_least_3_months
|
||||||
|
and exceeds_claim_amount_in_gbp
|
||||||
|
and exceeds_claim_count
|
||||||
|
and exceeds_claim_rate
|
||||||
|
then true
|
||||||
|
else false
|
||||||
|
end as user_exceeds_all_indicators
|
||||||
|
from rule_logic
|
||||||
|
|
@ -1,3 +1,5 @@
|
||||||
|
{{ config(materialized="table") }}
|
||||||
|
|
||||||
{% set ok_status = "Approved" %}
|
{% set ok_status = "Approved" %}
|
||||||
with
|
with
|
||||||
int_athena__verifications as (select * from {{ ref("int_athena__verifications") }}),
|
int_athena__verifications as (select * from {{ ref("int_athena__verifications") }}),
|
||||||
|
|
|
||||||
|
|
@ -259,3 +259,28 @@ models:
|
||||||
description: "Date of checkout for the booking"
|
description: "Date of checkout for the booking"
|
||||||
data_tests:
|
data_tests:
|
||||||
- not_null
|
- not_null
|
||||||
|
|
||||||
|
- name: int_athena__high_risk_client_detector
|
||||||
|
description: |
|
||||||
|
This model is used to detect high-risk clients based on their booking and claim history for
|
||||||
|
Guesty (Athena).
|
||||||
|
This is based on some business rules that might change in the future.
|
||||||
|
This is also based on a snapshot that might require updates in the future.
|
||||||
|
|
||||||
|
Current rules, based on the Data Request on July 1st 2025 by Chloe from Resolutions, are:
|
||||||
|
A User is considered a high-risk client if they fall into the below criteria:
|
||||||
|
1. The User has been using the agreed services for at least (3) months
|
||||||
|
2. The aggregated number of claims filed by the User exceeds a total of £2300
|
||||||
|
3. The User has filed at least (5) claims
|
||||||
|
4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment
|
||||||
|
columns:
|
||||||
|
- name: property_manager_email
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
Email of the property manager.
|
||||||
|
This is used to identify the property manager for the booking.
|
||||||
|
It is used to group bookings and claims by property manager.
|
||||||
|
It is unique and not null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
- unique
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,5 @@
|
||||||
|
{{ config(materialized="table") }}
|
||||||
|
|
||||||
with
|
with
|
||||||
stg_check_in_hero__checkins as (
|
stg_check_in_hero__checkins as (
|
||||||
select * from {{ ref("stg_check_in_hero__checkins") }}
|
select * from {{ ref("stg_check_in_hero__checkins") }}
|
||||||
|
|
|
||||||
|
|
@ -4817,14 +4817,6 @@ models:
|
||||||
- name: product_name
|
- name: product_name
|
||||||
data_type: character varying
|
data_type: character varying
|
||||||
description: Type of payment verification, categorizing the transaction.
|
description: Type of payment verification, categorizing the transaction.
|
||||||
data_tests:
|
|
||||||
- accepted_values:
|
|
||||||
values:
|
|
||||||
- "WAIVER"
|
|
||||||
- "DEPOSIT"
|
|
||||||
- "CHECKINCOVER"
|
|
||||||
- "FEE"
|
|
||||||
- "UNKNOWN"
|
|
||||||
|
|
||||||
- name: is_host_taking_waiver_risk
|
- name: is_host_taking_waiver_risk
|
||||||
data_type: boolean
|
data_type: boolean
|
||||||
|
|
@ -4875,38 +4867,28 @@ models:
|
||||||
description: |
|
description: |
|
||||||
The total amount of the payment in GBP.
|
The total amount of the payment in GBP.
|
||||||
This includes taxes if applicable.
|
This includes taxes if applicable.
|
||||||
data_tests:
|
|
||||||
- not_null
|
|
||||||
|
|
||||||
- name: amount_without_taxes_in_txn_currency
|
- name: amount_without_taxes_in_txn_currency
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: |
|
description: |
|
||||||
The net amount of the payment without taxes, in local currency.
|
The net amount of the payment without taxes, in local currency.
|
||||||
data_tests:
|
|
||||||
- not_null
|
|
||||||
|
|
||||||
- name: amount_without_taxes_in_gbp
|
- name: amount_without_taxes_in_gbp
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: |
|
description: |
|
||||||
The net amount of the payment without taxes, in GBP.
|
The net amount of the payment without taxes, in GBP.
|
||||||
data_tests:
|
|
||||||
- not_null
|
|
||||||
|
|
||||||
- name: tax_amount_in_txn_currency
|
- name: tax_amount_in_txn_currency
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: |
|
description: |
|
||||||
The tax portion of the payment, in local currency.
|
The tax portion of the payment, in local currency.
|
||||||
Will be 0 if no taxes apply.
|
Will be 0 if no taxes apply.
|
||||||
data_tests:
|
|
||||||
- not_null
|
|
||||||
|
|
||||||
- name: tax_amount_in_gbp
|
- name: tax_amount_in_gbp
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: |
|
description: |
|
||||||
The tax portion of the payment, in GBP. Will be 0 if no
|
The tax portion of the payment, in GBP. Will be 0 if no
|
||||||
taxes apply.
|
taxes apply.
|
||||||
data_tests:
|
|
||||||
- not_null
|
|
||||||
|
|
||||||
- name: amount_due_to_host_in_txn_currency
|
- name: amount_due_to_host_in_txn_currency
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,5 @@
|
||||||
|
{{ config(materialized="table") }}
|
||||||
|
|
||||||
{% set guesty_id_deal = "17814677813" %}
|
{% set guesty_id_deal = "17814677813" %}
|
||||||
with
|
with
|
||||||
int_edeposit__verification_fees as (
|
int_edeposit__verification_fees as (
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,5 @@
|
||||||
|
{{ config(materialized="table") }}
|
||||||
|
|
||||||
{% set ok_status = ("Approved", "Flagged") %}
|
{% set ok_status = ("Approved", "Flagged") %}
|
||||||
{% set rejected_status = "Rejected" %}
|
{% set rejected_status = "Rejected" %}
|
||||||
{% set rejected_fee = 0.25 %}
|
{% set rejected_fee = 0.25 %}
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,5 @@
|
||||||
|
{{ config(materialized="table") }}
|
||||||
|
|
||||||
{% set rejected_status = "REJECTED" %}
|
{% set rejected_status = "REJECTED" %}
|
||||||
{% set approved_flagged_status = ("APPROVED", "FLAGGED") %}
|
{% set approved_flagged_status = ("APPROVED", "FLAGGED") %}
|
||||||
{% set basic_protection = "BASIC PROTECTION" %}
|
{% set basic_protection = "BASIC PROTECTION" %}
|
||||||
|
|
|
||||||
|
|
@ -12,6 +12,8 @@ select
|
||||||
then 'Waiver'
|
then 'Waiver'
|
||||||
when product_name = 'DEPOSIT'
|
when product_name = 'DEPOSIT'
|
||||||
then 'Deposit'
|
then 'Deposit'
|
||||||
|
when product_name = 'STAYDISRUPT'
|
||||||
|
then 'StayDisrupt'
|
||||||
when product_name = 'UNKNOWN'
|
when product_name = 'UNKNOWN'
|
||||||
then null
|
then null
|
||||||
else product_name
|
else product_name
|
||||||
|
|
|
||||||
|
|
@ -1530,6 +1530,7 @@ models:
|
||||||
- "Waiver"
|
- "Waiver"
|
||||||
- "Deposit"
|
- "Deposit"
|
||||||
- "CheckInCover"
|
- "CheckInCover"
|
||||||
|
- "StayDisrupt"
|
||||||
- "Fee"
|
- "Fee"
|
||||||
|
|
||||||
- name: is_host_taking_waiver_risk
|
- name: is_host_taking_waiver_risk
|
||||||
|
|
|
||||||
|
|
@ -403,28 +403,15 @@ models:
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: "Amount of the guest contribution, in case they did,
|
description: "Amount of the guest contribution, in case they did,
|
||||||
in local currency."
|
in local currency."
|
||||||
data_tests:
|
|
||||||
- dbt_expectations.expect_column_values_to_be_between:
|
|
||||||
min_value: 0
|
|
||||||
strictly: false
|
|
||||||
where: not is_incident_missing_details
|
|
||||||
|
|
||||||
- name: guest_contribution_currency
|
- name: guest_contribution_currency
|
||||||
data_type: text
|
data_type: text
|
||||||
description: "Currency of the guest contribution."
|
description: "Currency of the guest contribution."
|
||||||
data_tests:
|
|
||||||
- not_null:
|
|
||||||
where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"
|
|
||||||
|
|
||||||
- name: guest_contribution_amount_in_gbp
|
- name: guest_contribution_amount_in_gbp
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: "Amount of the guest contribution, in case they did,
|
description: "Amount of the guest contribution, in case they did,
|
||||||
in GBP."
|
in GBP."
|
||||||
data_tests:
|
|
||||||
- dbt_expectations.expect_column_values_to_be_between:
|
|
||||||
min_value: 0
|
|
||||||
strictly: false
|
|
||||||
where: not is_incident_missing_details
|
|
||||||
|
|
||||||
- name: is_guest_contacted_about_damage
|
- name: is_guest_contacted_about_damage
|
||||||
data_type: boolean
|
data_type: boolean
|
||||||
|
|
|
||||||
|
|
@ -179,18 +179,10 @@ models:
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: "Amount of the guest contribution, in case they did,
|
description: "Amount of the guest contribution, in case they did,
|
||||||
in local currency."
|
in local currency."
|
||||||
data_tests:
|
|
||||||
- dbt_expectations.expect_column_values_to_be_between:
|
|
||||||
min_value: 0
|
|
||||||
strictly: false
|
|
||||||
where: not is_incident_missing_details
|
|
||||||
|
|
||||||
- name: guest_contribution_currency
|
- name: guest_contribution_currency
|
||||||
data_type: text
|
data_type: text
|
||||||
description: "Currency of the guest contribution."
|
description: "Currency of the guest contribution."
|
||||||
data_tests:
|
|
||||||
- not_null:
|
|
||||||
where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"
|
|
||||||
|
|
||||||
- name: is_guest_contacted_about_damage
|
- name: is_guest_contacted_about_damage
|
||||||
data_type: boolean
|
data_type: boolean
|
||||||
|
|
@ -465,19 +457,11 @@ models:
|
||||||
data_type: numeric
|
data_type: numeric
|
||||||
description: "Claim amount in local currency if the host is seeking
|
description: "Claim amount in local currency if the host is seeking
|
||||||
compensation from another platform."
|
compensation from another platform."
|
||||||
data_tests:
|
|
||||||
- dbt_expectations.expect_column_values_to_be_between:
|
|
||||||
min_value: 0
|
|
||||||
strictly: false
|
|
||||||
where: "not is_incident_missing_details"
|
|
||||||
|
|
||||||
- name: third_party_claim_currency
|
- name: third_party_claim_currency
|
||||||
data_type: text
|
data_type: text
|
||||||
description: "Currency of the claim amount if the host is seeking
|
description: "Currency of the claim amount if the host is seeking
|
||||||
compensation from another platform."
|
compensation from another platform."
|
||||||
data_tests:
|
|
||||||
- not_null:
|
|
||||||
where: "third_party_claim_amount_in_txn_currency > 0 and not is_incident_missing_details"
|
|
||||||
|
|
||||||
- name: cosmos_db_timestamp_utc
|
- name: cosmos_db_timestamp_utc
|
||||||
data_type: timestamp
|
data_type: timestamp
|
||||||
|
|
|
||||||
|
|
@ -8,7 +8,7 @@
|
||||||
{% set tests_or_cancelled_incidents = "ARCHIVED" %}
|
{% set tests_or_cancelled_incidents = "ARCHIVED" %}
|
||||||
|
|
||||||
-- Some incidents have insufficient details which might create data quality issues.
|
-- Some incidents have insufficient details which might create data quality issues.
|
||||||
{% set insufficient_details_incidents = "INSUFFICIENT DETAILS" %}
|
{% set insufficient_details_incidents = ("INSUFFICIENT DETAILS", "INCOMPLETE") %}
|
||||||
|
|
||||||
with
|
with
|
||||||
raw_incident as (select * from {{ source("resolutions", "incident") }}),
|
raw_incident as (select * from {{ source("resolutions", "incident") }}),
|
||||||
|
|
@ -21,7 +21,7 @@ select
|
||||||
{{ adapter.quote("documents") }} ->> 'VerificationId' as id_verification,
|
{{ adapter.quote("documents") }} ->> 'VerificationId' as id_verification,
|
||||||
{{ adapter.quote("documents") }} ->> 'CurrentStatusName' as current_status_name,
|
{{ adapter.quote("documents") }} ->> 'CurrentStatusName' as current_status_name,
|
||||||
upper({{ adapter.quote("documents") }} ->> 'CurrentStatusName')
|
upper({{ adapter.quote("documents") }} ->> 'CurrentStatusName')
|
||||||
= '{{ insufficient_details_incidents }}' as is_incident_missing_details,
|
in {{ insufficient_details_incidents }} as is_incident_missing_details,
|
||||||
({{ adapter.quote("documents") }} ->> 'IsSubmissionComplete')::boolean
|
({{ adapter.quote("documents") }} ->> 'IsSubmissionComplete')::boolean
|
||||||
as is_submission_complete,
|
as is_submission_complete,
|
||||||
{{ adapter.quote("documents") }} ->> 'CurrentAgentName' as current_agent_name,
|
{{ adapter.quote("documents") }} ->> 'CurrentAgentName' as current_agent_name,
|
||||||
|
|
|
||||||
77
run_tests.sh
77
run_tests.sh
|
|
@ -1,17 +1,11 @@
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
# === Logging setup ===
|
exec >> /home/azureuser/dbt_tests.log 2>&1
|
||||||
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
|
||||||
LOG_FILE="/home/azureuser/dbt_test_logs/dbt_tests_${TIMESTAMP}.log"
|
|
||||||
exec >> "$LOG_FILE" 2>&1
|
|
||||||
|
|
||||||
echo "=== dbt test run started at $TIMESTAMP ==="
|
# Define the Slack webhook URL
|
||||||
|
|
||||||
# === Slack webhook setup ===
|
|
||||||
script_dir=$(dirname "$0")
|
script_dir=$(dirname "$0")
|
||||||
webhooks_file="slack_webhook_urls.txt"
|
webhooks_file="slack_webhook_urls.txt"
|
||||||
env_file="$script_dir/$webhooks_file"
|
env_file="$script_dir/$webhooks_file"
|
||||||
|
|
||||||
if [ -f "$env_file" ]; then
|
if [ -f "$env_file" ]; then
|
||||||
export $(grep -v '^#' "$env_file" | xargs)
|
export $(grep -v '^#' "$env_file" | xargs)
|
||||||
else
|
else
|
||||||
|
|
@ -19,79 +13,34 @@ else
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# Messages to be sent to Slack
|
||||||
slack_failure_message=":rotating_light::rotating_light::rotating_light: One or more failures in dbt tests in production. :rotating_light::rotating_light::rotating_light:"
|
slack_failure_message=":rotating_light::rotating_light::rotating_light: One or more failures in dbt tests in production. :rotating_light::rotating_light::rotating_light:"
|
||||||
slack_success_message=":white_check_mark::white_check_mark::white_check_mark: dbt tests executed successfully in production. :white_check_mark::white_check_mark::white_check_mark:"
|
slack_success_message=":white_check_mark::white_check_mark::white_check_mark: dbt tests executed successfully in production. :white_check_mark::white_check_mark::white_check_mark:"
|
||||||
|
|
||||||
|
# Initialize the failure flag
|
||||||
has_any_step_failed=0
|
has_any_step_failed=0
|
||||||
|
|
||||||
# === Navigate to project ===
|
cd /home/azureuser/data-dwh-dbt-project
|
||||||
cd /home/azureuser/data-dwh-dbt-project || exit 1
|
|
||||||
|
|
||||||
# === Update from Git ===
|
# Update from git
|
||||||
echo "Updating dbt project from git."
|
echo "Updating dbt project from git."
|
||||||
git checkout master
|
git checkout master
|
||||||
git pull
|
git pull
|
||||||
|
|
||||||
# === Activate virtual environment ===
|
# Activate venv
|
||||||
source venv/bin/activate
|
source venv/bin/activate
|
||||||
|
|
||||||
# === Run dbt tests ===
|
# Run tests
|
||||||
echo "Triggering dbt test"
|
echo "Triggering dbt test"
|
||||||
dbt test
|
dbt test
|
||||||
if [ $? -ne 0 ]; then
|
if [ $? -ne 0 ]; then
|
||||||
has_any_step_failed=1
|
has_any_step_failed=1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# === Handle success ===
|
# Check if any step failed and send a Slack message
|
||||||
|
if [ $has_any_step_failed -eq 1 ]; then
|
||||||
|
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_failure_message\"}" $SLACK_ALERT_WEBHOOK_URL
|
||||||
|
fi
|
||||||
if [ $has_any_step_failed -eq 0 ]; then
|
if [ $has_any_step_failed -eq 0 ]; then
|
||||||
curl -X POST -H 'Content-type: application/json' \
|
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_success_message\"}" $SLACK_RECEIPT_WEBHOOK_URL
|
||||||
--data "{\"text\":\"$slack_success_message\"}" \
|
|
||||||
"$SLACK_RECEIPT_WEBHOOK_URL"
|
|
||||||
exit 0
|
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# === Handle failures: parse log and send individual Slack messages ===
|
|
||||||
echo "Parsing log file for test failures..."
|
|
||||||
|
|
||||||
grep -E "Failure in test|Got [0-9]+ result|compiled code at" "$LOG_FILE" | while read -r line; do
|
|
||||||
if [[ "$line" =~ Failure\ in\ test\ ([^[:space:]]+)\ \((.*)\) ]]; then
|
|
||||||
TEST_NAME="${BASH_REMATCH[1]}"
|
|
||||||
echo "==> Detected failure: $TEST_NAME"
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ "$line" =~ Got\ ([0-9]+)\ result ]]; then
|
|
||||||
FAILED_ROWS="${BASH_REMATCH[1]}"
|
|
||||||
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ "$line" =~ compiled\ code\ at\ (.*) ]]; then
|
|
||||||
RELATIVE_PATH="${BASH_REMATCH[1]}"
|
|
||||||
COMPILED_SQL_FILE="/home/azureuser/data-dwh-dbt-project/${RELATIVE_PATH}"
|
|
||||||
# Check sqlfluff availability
|
|
||||||
if ! command -v sqlfluff >/dev/null 2>&1; then
|
|
||||||
echo "ERROR: sqlfluff is not installed or not in PATH"
|
|
||||||
SQL_QUERY="sqlfluff not found on system"
|
|
||||||
elif [ -f "$COMPILED_SQL_FILE" ]; then
|
|
||||||
echo "File exists, attempting to format with sqlfluff..."
|
|
||||||
FORMATTED_SQL=$(sqlfluff render "$COMPILED_SQL_FILE" --dialect postgres 2>&1)
|
|
||||||
if [ -n "$FORMATTED_SQL" ]; then
|
|
||||||
echo "We have formatted SQL"
|
|
||||||
SQL_QUERY=$(echo "$FORMATTED_SQL" | sed 's/"/\\"/g')
|
|
||||||
else
|
|
||||||
echo "sqlfluff returned empty result, falling back to raw file content"
|
|
||||||
SQL_QUERY=$(<"$COMPILED_SQL_FILE" sed 's/"/\\"/g')
|
|
||||||
fi
|
|
||||||
else
|
|
||||||
echo "ERROR: File not found: $COMPILED_SQL_FILE"
|
|
||||||
SQL_QUERY="Could not find compiled SQL file: $COMPILED_SQL_FILE"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# === Send Slack message for this failed test ===
|
|
||||||
echo "Sending message for failed test $TEST_NAME"
|
|
||||||
SLACK_MESSAGE=":rotating_light: *Test Failure Detected!* :rotating_light:\n\n*Test:* \`$TEST_NAME\`\n*Failed Rows:* $FAILED_ROWS\n*Query:*\n\`\`\`\n$SQL_QUERY\n\`\`\`"
|
|
||||||
|
|
||||||
curl -X POST -H 'Content-type: application/json' \
|
|
||||||
--data "{\"text\":\"$SLACK_MESSAGE\"}" \
|
|
||||||
"$SLACK_ALERT_WEBHOOK_URL"
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
|
|
@ -427,3 +427,48 @@ seeds:
|
||||||
Name of the hubspot account owner.
|
Name of the hubspot account owner.
|
||||||
data_tests:
|
data_tests:
|
||||||
- not_null
|
- not_null
|
||||||
|
|
||||||
|
- name: stg_seed__guesty_resolutions_snapshot_20250701
|
||||||
|
description: |
|
||||||
|
A snapshot of Guesty Resolutions data as of 2025-07-01.
|
||||||
|
This is a static snapshot and we currently have no intent of maintaining up to date.
|
||||||
|
The data was shared by Chloe from Resolutions in a static file.
|
||||||
|
|
||||||
|
The fields described are those that are used in following models.
|
||||||
|
|
||||||
|
columns:
|
||||||
|
- name: id_booking
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
The internal ID of this booking in Guesty. Matches with the booking ID
|
||||||
|
in the Guesty verifications table.
|
||||||
|
It can contain duplicated bookings, and this is out of our scope.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: claim_date
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
When was the claim received by Truvi, in format dd/mm/yyyy.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: claim_amount
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
The amount of the claim in the currency specified in claim_currency.
|
||||||
|
It's text by default since it might contain data quality issues.
|
||||||
|
The conversion to decimal is done in dependant models.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: claim_currency
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
The currency specified in the claim amount.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
|
||||||
1639
seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
Normal file
1639
seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -371,18 +371,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
|
||||||
16,Revenue Churn Rate,2025-01-31,0.0262,0.0231,0.0234
|
16,Revenue Churn Rate,2025-01-31,0.0262,0.0231,0.0234
|
||||||
16,Revenue Churn Rate,2025-02-28,0.0189,0.0227,0.0234
|
16,Revenue Churn Rate,2025-02-28,0.0189,0.0227,0.0234
|
||||||
16,Revenue Churn Rate,2025-03-31,0.0300,0.0234,0.0234
|
16,Revenue Churn Rate,2025-03-31,0.0300,0.0234,0.0234
|
||||||
16,Revenue Churn Rate,2025-04-30,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-04-30,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-05-31,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-05-31,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-06-30,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-06-30,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-07-31,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-07-31,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-08-31,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-08-31,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-09-30,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-09-30,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-10-31,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-10-31,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-11-30,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-11-30,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2025-12-31,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2025-12-31,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2026-01-31,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2026-01-31,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2026-02-28,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2026-02-28,0.0100,0.0100,0.0100
|
||||||
16,Revenue Churn Rate,2026-03-31,0.0300,0.0300,0.0300
|
16,Revenue Churn Rate,2026-03-31,0.0100,0.0100,0.0100
|
||||||
17,Revenue Churn,2024-04-30,3762,3762,121454
|
17,Revenue Churn,2024-04-30,3762,3762,121454
|
||||||
17,Revenue Churn,2024-05-31,3029,6791,121454
|
17,Revenue Churn,2024-05-31,3029,6791,121454
|
||||||
17,Revenue Churn,2024-06-30,4583,11374,121454
|
17,Revenue Churn,2024-06-30,4583,11374,121454
|
||||||
|
|
@ -395,18 +395,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
|
||||||
17,Revenue Churn,2025-01-31,11880,97326,121454
|
17,Revenue Churn,2025-01-31,11880,97326,121454
|
||||||
17,Revenue Churn,2025-02-28,8961,106287,121454
|
17,Revenue Churn,2025-02-28,8961,106287,121454
|
||||||
17,Revenue Churn,2025-03-31,15167,121454,121454
|
17,Revenue Churn,2025-03-31,15167,121454,121454
|
||||||
17,Revenue Churn,2025-04-30,15781,15781,229212
|
17,Revenue Churn,2025-04-30,5260,5260,76404
|
||||||
17,Revenue Churn,2025-05-31,16358,32139,229212
|
17,Revenue Churn,2025-05-31,5453,10713,76404
|
||||||
17,Revenue Churn,2025-06-30,17005,49144,229212
|
17,Revenue Churn,2025-06-30,5668,16381,76404
|
||||||
17,Revenue Churn,2025-07-31,17679,66823,229212
|
17,Revenue Churn,2025-07-31,5893,22274,76404
|
||||||
17,Revenue Churn,2025-08-31,18397,85221,229212
|
17,Revenue Churn,2025-08-31,6132,28407,76404
|
||||||
17,Revenue Churn,2025-09-30,18998,104219,229212
|
17,Revenue Churn,2025-09-30,6333,34740,76404
|
||||||
17,Revenue Churn,2025-10-31,19566,123785,229212
|
17,Revenue Churn,2025-10-31,6522,41262,76404
|
||||||
17,Revenue Churn,2025-11-30,20199,143985,229212
|
17,Revenue Churn,2025-11-30,6733,47995,76404
|
||||||
17,Revenue Churn,2025-12-31,20761,164746,229212
|
17,Revenue Churn,2025-12-31,6920,54915,76404
|
||||||
17,Revenue Churn,2026-01-31,21087,185833,229212
|
17,Revenue Churn,2026-01-31,7029,61944,76404
|
||||||
17,Revenue Churn,2026-02-28,21690,207522,229212
|
17,Revenue Churn,2026-02-28,7230,69174,76404
|
||||||
17,Revenue Churn,2026-03-31,21690,229212,229212
|
17,Revenue Churn,2026-03-31,7230,76404,76404
|
||||||
18,Booking Fee per Billable Booking,2024-04-30,4.410,4.410,3.769
|
18,Booking Fee per Billable Booking,2024-04-30,4.410,4.410,3.769
|
||||||
18,Booking Fee per Billable Booking,2024-05-31,4.687,4.553,3.769
|
18,Booking Fee per Billable Booking,2024-05-31,4.687,4.553,3.769
|
||||||
18,Booking Fee per Billable Booking,2024-06-30,3.825,4.310,3.769
|
18,Booking Fee per Billable Booking,2024-06-30,3.825,4.310,3.769
|
||||||
|
|
|
||||||
|
Loading…
Add table
Add a link
Reference in a new issue