Compare commits

...

11 commits

Author SHA1 Message Date
Oriol Roqué Paniagua
bc3a364891 Merged PR 5677: Athena/Guesty high risk clients
# Description

* Adds the new snapshot for Guesty Claims, up to 1st July 2025.
* Creates a model named int_athena__high_risk_client_detector that handles the following logic:

1. The User has been using the agreed services for at least (3) months
2. The aggregated number of claims filed by the User exceeds a total of £2300
3. The User has filed at least (5) claims
4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment

It's heavily opinionated due to lack of clear requirements and lack of data quality, both in athena verifications and guesty claims. Please, check the inline comments for more info.

With these model and conditions, only 2 users would be tagged as high risk.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #31687
2025-07-11 10:28:24 +00:00
Pablo Martin
ddc0a6a3f4 Merged PR 5678: Revert 'Prettify alerts in test script'
# Description

We revert this script to its previous step due to it being too fragile in the current implementation and us not having capacity to make it robust enough right now.

Reverts !5551

Related work items: #31476
2025-07-11 09:20:01 +00:00
Oriol Roqué Paniagua
2f14b3305c Merged PR 5652: Remove third party and guest involvements tests
# Description

Remove third party and guest involvements tests from Resolutions models, after what we discussed with Ant in the channel #resolutions-data

This fixes the alerts around resolutions.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #31843
2025-07-09 12:31:15 +00:00
Pablo Martin
a1b67d20f1 change materialization of heavy tables 2025-07-09 11:33:54 +02:00
Pablo Martin
7488400cbb fix bugs in orphan model detection 2025-07-08 17:05:45 +02:00
Pablo Martin
717590513f Merged PR 5617: Orphan Models Script
# Description

This PR adds a script to look for orphan models in the DWH. The `README.md` has been expanded to explain how to use and schedule this script.
2025-07-08 12:45:30 +00:00
Oriol Roqué Paniagua
a1429ccec8 Merged PR 5634: Adapt Revenue Churn Rate Targets from 3% to 1%
# Description

Adapt Revenue Churn Rate Targets from 3% to 1%, also affecting Revenue Churn (to 1/3rd).

This should put our targets aligned with those on Finance side.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #31351
2025-07-07 12:54:39 +00:00
Oriol Roqué Paniagua
900c73b076 Merged PR 5632: Resolution incidents in status Incomplete now have reduced test coverage
# Description

Resolution incidents in status Incomplete now have reduced test coverage.

This fixes today's data alert.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #31843
2025-07-07 12:53:52 +00:00
Pablo Martin
ad67a79a24 script and docs 2025-07-04 12:25:21 +02:00
Joaquin Ossa
e0e97709c0 Merged PR 5607: stay confident inclusion
# Description

Removed tests in intermediate model and added StayDisrupt as an accepted value for `product_name`

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

stay confident inclusion

Related work items: #31721
2025-07-02 13:42:45 +00:00
Joaquin
b9fe9a0552 stay confident inclusion 2025-07-02 14:37:44 +02:00
19 changed files with 2040 additions and 138 deletions

View file

@ -134,6 +134,37 @@ Once you build the docs with `run_docs.sh`, you will have a bunch of files. To o
This goes beyond the scope of this project: to understand how you can serve these, refer to our [infra script repo](https://guardhog.visualstudio.com/Data/_git/data-infra-script). Specifically, the bits around the web gateway set up.
## Detecting (and dropping) orphan models in the DWH
If you remove a model from the dbt project, but that model had already been materialized as a table or view in the DWH, the DWH object won't go on its own. You'll have to explictly drop it.
In order to make your life easier, we have a utility script in this repo for this purpose: `find_orphan_models_in_db.sh`.
You can use this script to detect and identify any orphan models. The script can be used one off or be scheduled with slack messaging, so you get automated alerts any time an orphan model appears.
The script is designed to be called from the same machine where you are executing the regular `dbt run` calls. You can try to use it in your local machine, but there are multiple gotchas which might lead to confusion.
To use it:
- *Note that this assumes you've set up the project in the VM as described in previous sections. If you deviate in naming, paths, etc, you'll probably have to adjust some references here.*
- In the VM, copy it from the project repo into the home folder: `cp find_orphan_models_in_db.sh ~/find_orphan_models_in_db.sh` and make it executable: `chmod 700 ~/find_orphan_models_in_db.sh`.
- The script takes two positional arguments: a comma separated list of schemas to review, and a path to dbt's `manifest.json`.
- Typically, if you call from the VM, you would do: `./find_orphan_models_in_db.sh staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json`.
- There is an optional `--slack` flag that will send success/failure messages to slack channels. The necessary configuration is the same described in the "How to schedule" section, so if you've already set up the dbt run, test and docs commands, you don't need to take any other steps to start sending slack messages.
- Example usage: `./find_orphan_models_in_db.sh --slack staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json `.
How to schedule:
- Simply add a cronjob in the VM with the command:
```bash
COMMAND="0 9 * * * /bin/bash /home/azureuser/find_orphan_models_in_db.sh --slack staging,intermediate,reporting /home/azureuser/data-dwh-dbt-project/target/manifest.json"
(crontab -u $USER -l; echo "$COMMAND" ) | crontab -u $USER -
```
Note some caveats:
- `sync` models are not checked.
- If for any reason, you add tables or views that are unrelated to the dbt project in the monitored schemas, these will be identified as orphan by this script. Be careful, you might drop them accidentally if you don't pay attention. The simple solution to this is... don't use dbt schemas for non-dbt purposes.
## CI
CI can be setup to review PRs and make the developer experience more solid and less error prone.

146
find_orphan_models_in_db.sh Normal file
View file

@ -0,0 +1,146 @@
#!/bin/bash
set -euo pipefail
STARTING_DIR="/home/azureuser"
cd "$STARTING_DIR"
# === CONFIGURATION ===
DBT_PROJECT="dwh_dbt"
DBT_TARGET="prd"
PROFILE_YML="$STARTING_DIR/.dbt/profiles.yml"
# === Flag defaults ===
SEND_SLACK=false
# === Parse flags ===
while [[ $# -gt 0 ]]; do
case "$1" in
-s|--slack)
SEND_SLACK=true
shift
;;
-*)
echo "❌ Unknown option: $1"
exit 1
;;
*)
break
;;
esac
done
# === Positional arguments ===
SCHEMAS="$1"
MANIFEST_PATH="$2"
shift 2
IFS=',' read -r -a SCHEMA_ARRAY <<< "$SCHEMAS"
# === Tool check/install ===
install_tool_if_missing() {
TOOL_CALL_NAME=$1
TOOL_APT_NAME=$2
if ! command -v "$TOOL_CALL_NAME" &>/dev/null; then
echo "🔧 Installing missing tool: $TOOL_APT_NAME"
sudo apt-get update -qq
sudo apt-get install -y "$TOOL_APT_NAME"
else
echo "$TOOL_APT_NAME is installed"
fi
}
install_tool_if_missing jq jq
install_tool_if_missing yq yq
install_tool_if_missing psql postgresql-client
# === Slack webhook setup ===
script_dir=$(dirname "$0")
webhooks_file="slack_webhook_urls.txt"
env_file="$script_dir/$webhooks_file"
if [ -f "$env_file" ]; then
export $(grep -v '^#' "$env_file" | xargs)
else
echo "Error: $webhooks_file file not found in the script directory."
exit 1
fi
# === Load DB credentials from profiles.yml ===
echo "🔐 Loading DB credentials from $PROFILE_YML..."
DB_NAME=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.dbname" "$PROFILE_YML")
DB_USER=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.user" "$PROFILE_YML")
DB_HOST=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.host" "$PROFILE_YML")
DB_PORT=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.port" "$PROFILE_YML")
export PGPASSWORD=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.pass" "$PROFILE_YML")
# === Get list of tables/views from Postgres ===
echo "🗃️ Reading current tables/views from PostgreSQL..."
POSTGRES_OBJECTS=()
for SCHEMA in "${SCHEMA_ARRAY[@]}"; do
echo "🔎 Scanning schema: $SCHEMA"
TABLES=$(psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -Atc "
SELECT LOWER(table_schema || '.' || table_name)
FROM information_schema.tables
WHERE table_schema = '$SCHEMA'
AND table_type IN ('BASE TABLE', 'VIEW')
AND table_name NOT LIKE 'pg_%'
ORDER BY table_schema, table_name;
")
while IFS= read -r tbl; do
tbl_cleaned=$(echo "$tbl" | tr -d '[:space:]')
[[ -n "$tbl_cleaned" ]] && POSTGRES_OBJECTS+=("$tbl_cleaned")
done <<< "$TABLES"
done
POSTGRES_OBJECTS=($(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort -u))
# === Parse manifest.json for dbt model output names ===
echo "📦 Extracting model output names from dbt manifest..."
DBT_OBJECTS=()
DBT_ENTRIES=$(jq -r '
.nodes | to_entries[] |
select(.value.resource_type == "model" or .value.resource_type == "seed") |
.value.schema + "." + .value.alias
' "$MANIFEST_PATH")
while IFS= read -r entry; do
entry_cleaned=$(echo "$entry" | tr -d '[:space:]' | tr '[:upper:]' '[:lower:]')
[[ -n "$entry_cleaned" ]] && DBT_OBJECTS+=("$entry_cleaned")
done <<< "$DBT_ENTRIES"
DBT_OBJECTS=($(printf "%s\n" "${DBT_OBJECTS[@]}" | sort -u))
# === Compare ===
echo "📊 Comparing DBT models vs Postgres state..."
RELEVANT_MODELS=($(comm -12 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
STALE_MODELS=($(comm -23 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
# === Output ===
echo ""
echo "✅ Relevant models (in both DB and DBT):"
printf "%s\n" "${RELEVANT_MODELS[@]}" | sort
echo ""
echo "⚠️ Stale models (in DB but NOT in DBT):"
printf "%s\n" "${STALE_MODELS[@]}" | sort
# === Format stale models for Slack ===
if [ "$SEND_SLACK" = true ]; then
echo "✅ Sending slack message with results."
if [ ${#STALE_MODELS[@]} -eq 0 ]; then
SLACK_MSG=":white_check_mark::white_check_mark::white_check_mark: dbt models reviewed. No stale models found in the database! :white_check_mark::white_check_mark::white_check_mark:"
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$SLACK_MSG\"}" \
"$SLACK_RECEIPT_WEBHOOK_URL"
else
SLACK_MSG=":rotating_light::rotating_light::rotating_light: Stale models detected in Postgres (not in dbt manifest): :rotating_light::rotating_light::rotating_light:\n"
for model in "${STALE_MODELS[@]}"; do
SLACK_MSG+="- \`$model\`\n"
done
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$SLACK_MSG\"}" \
"$SLACK_ALERT_WEBHOOK_URL"
fi
fi

View file

@ -0,0 +1,101 @@
/*
Dear DWH modeller.
Be aware that this model is heavily opinionated due to many data quality issues, affecting both
Athena Verifications and Guesty Claims.
We will consider a User to be a property manager email.
If a Booking is duplicated at PM email, then we will dedup it.
If a Booking is duplicated among several PM emails, then it will be considered as different Bookings.
If a Booking has several Claims, all of them will be considered, and the claim amount will be aggregated.
Keep in mind that the model uses a snapshot of Guesty Resolutions from 1st of July 2025.
This also means that the conditions for the User to be considered a high-risk client are hardcoded.
*/
with
stg_athena__verifications as (
select
-- Be aware that the same id booking can happen for more than one PM...
property_manager_email,
id_booking,
-- In case of booking duplicates per PM email, just retrieve the first
-- creation
min(created_date_utc) as created_date_utc
from {{ ref("stg_athena__verifications") }}
where id_booking is not null
group by 1, 2
),
stg_seed__guesty_resolutions as (
select
id_booking,
to_date(claim_date, 'DD/MM/YYYY') as claim_date,
case
when claim_amount ~ '^[0-9]+(\.[0-9]+)?$'
then cast(claim_amount as decimal)
else null
end as claim_amount,
claim_currency
from {{ ref("stg_seed__guesty_resolutions_snapshot_20250701") }}
),
int_daily_currency_exchange_rates as (
select * from {{ ref("int_daily_currency_exchange_rates") }}
),
users_3_months_activity as (
-- 1. The User has been using the agreed services for at least (3) months
-- (considered as 1st of July 2025)
select
property_manager_email,
min(created_date_utc) as first_verification_created_per_pm,
count(distinct id_booking) as total_count_of_bookings_per_pm
from stg_athena__verifications sav
group by 1
),
users_with_claims as (
select
u.property_manager_email,
u.first_verification_created_per_pm,
u.total_count_of_bookings_per_pm,
count(r.id_booking) as count_of_claims,
round(sum(r.claim_amount * er.rate), 0) as total_claim_amount_in_gbp,
1.0 * count(r.id_booking) / u.total_count_of_bookings_per_pm as claim_rate
from users_3_months_activity u
inner join
stg_athena__verifications v
on u.property_manager_email = v.property_manager_email
left join stg_seed__guesty_resolutions r on v.id_booking = r.id_booking
left join
int_daily_currency_exchange_rates er
on r.claim_currency = er.from_currency
and er.to_currency = 'GBP'
and r.claim_date = er.rate_date_utc
group by 1, 2, 3
),
rule_logic as (
select
*,
case
when first_verification_created_per_pm < '2025-04-01'
then true
else false
end as has_been_using_services_for_at_least_3_months,
case
when total_claim_amount_in_gbp > 2300 then true else false
end as exceeds_claim_amount_in_gbp,
case
when count_of_claims >= 5 then true else false
end as exceeds_claim_count,
case when claim_rate >= 0.07 then true else false end as exceeds_claim_rate
from users_with_claims
)
select
*,
case
when
has_been_using_services_for_at_least_3_months
and exceeds_claim_amount_in_gbp
and exceeds_claim_count
and exceeds_claim_rate
then true
else false
end as user_exceeds_all_indicators
from rule_logic

View file

@ -1,3 +1,5 @@
{{ config(materialized="table") }}
{% set ok_status = "Approved" %}
with
int_athena__verifications as (select * from {{ ref("int_athena__verifications") }}),

View file

@ -259,3 +259,28 @@ models:
description: "Date of checkout for the booking"
data_tests:
- not_null
- name: int_athena__high_risk_client_detector
description: |
This model is used to detect high-risk clients based on their booking and claim history for
Guesty (Athena).
This is based on some business rules that might change in the future.
This is also based on a snapshot that might require updates in the future.
Current rules, based on the Data Request on July 1st 2025 by Chloe from Resolutions, are:
A User is considered a high-risk client if they fall into the below criteria:
1. The User has been using the agreed services for at least (3) months
2. The aggregated number of claims filed by the User exceeds a total of £2300
3. The User has filed at least (5) claims
4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment
columns:
- name: property_manager_email
data_type: character varying
description: |
Email of the property manager.
This is used to identify the property manager for the booking.
It is used to group bookings and claims by property manager.
It is unique and not null.
data_tests:
- not_null
- unique

View file

@ -1,3 +1,5 @@
{{ config(materialized="table") }}
with
stg_check_in_hero__checkins as (
select * from {{ ref("stg_check_in_hero__checkins") }}

View file

@ -4817,14 +4817,6 @@ models:
- name: product_name
data_type: character varying
description: Type of payment verification, categorizing the transaction.
data_tests:
- accepted_values:
values:
- "WAIVER"
- "DEPOSIT"
- "CHECKINCOVER"
- "FEE"
- "UNKNOWN"
- name: is_host_taking_waiver_risk
data_type: boolean
@ -4875,38 +4867,28 @@ models:
description: |
The total amount of the payment in GBP.
This includes taxes if applicable.
data_tests:
- not_null
- name: amount_without_taxes_in_txn_currency
data_type: numeric
description: |
The net amount of the payment without taxes, in local currency.
data_tests:
- not_null
- name: amount_without_taxes_in_gbp
data_type: numeric
description: |
The net amount of the payment without taxes, in GBP.
data_tests:
- not_null
- name: tax_amount_in_txn_currency
data_type: numeric
description: |
The tax portion of the payment, in local currency.
Will be 0 if no taxes apply.
data_tests:
- not_null
- name: tax_amount_in_gbp
data_type: numeric
description: |
The tax portion of the payment, in GBP. Will be 0 if no
taxes apply.
data_tests:
- not_null
- name: amount_due_to_host_in_txn_currency
data_type: numeric

View file

@ -1,3 +1,5 @@
{{ config(materialized="table") }}
{% set guesty_id_deal = "17814677813" %}
with
int_edeposit__verification_fees as (

View file

@ -1,3 +1,5 @@
{{ config(materialized="table") }}
{% set ok_status = ("Approved", "Flagged") %}
{% set rejected_status = "Rejected" %}
{% set rejected_fee = 0.25 %}

View file

@ -1,3 +1,5 @@
{{ config(materialized="table") }}
{% set rejected_status = "REJECTED" %}
{% set approved_flagged_status = ("APPROVED", "FLAGGED") %}
{% set basic_protection = "BASIC PROTECTION" %}

View file

@ -12,6 +12,8 @@ select
then 'Waiver'
when product_name = 'DEPOSIT'
then 'Deposit'
when product_name = 'STAYDISRUPT'
then 'StayDisrupt'
when product_name = 'UNKNOWN'
then null
else product_name

View file

@ -1530,6 +1530,7 @@ models:
- "Waiver"
- "Deposit"
- "CheckInCover"
- "StayDisrupt"
- "Fee"
- name: is_host_taking_waiver_risk

View file

@ -403,28 +403,15 @@ models:
data_type: numeric
description: "Amount of the guest contribution, in case they did,
in local currency."
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
where: not is_incident_missing_details
- name: guest_contribution_currency
data_type: text
description: "Currency of the guest contribution."
data_tests:
- not_null:
where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"
- name: guest_contribution_amount_in_gbp
data_type: numeric
description: "Amount of the guest contribution, in case they did,
in GBP."
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
where: not is_incident_missing_details
- name: is_guest_contacted_about_damage
data_type: boolean

View file

@ -179,18 +179,10 @@ models:
data_type: numeric
description: "Amount of the guest contribution, in case they did,
in local currency."
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
where: not is_incident_missing_details
- name: guest_contribution_currency
data_type: text
description: "Currency of the guest contribution."
data_tests:
- not_null:
where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"
- name: is_guest_contacted_about_damage
data_type: boolean
@ -465,19 +457,11 @@ models:
data_type: numeric
description: "Claim amount in local currency if the host is seeking
compensation from another platform."
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
where: "not is_incident_missing_details"
- name: third_party_claim_currency
data_type: text
description: "Currency of the claim amount if the host is seeking
compensation from another platform."
data_tests:
- not_null:
where: "third_party_claim_amount_in_txn_currency > 0 and not is_incident_missing_details"
- name: cosmos_db_timestamp_utc
data_type: timestamp

View file

@ -8,7 +8,7 @@
{% set tests_or_cancelled_incidents = "ARCHIVED" %}
-- Some incidents have insufficient details which might create data quality issues.
{% set insufficient_details_incidents = "INSUFFICIENT DETAILS" %}
{% set insufficient_details_incidents = ("INSUFFICIENT DETAILS", "INCOMPLETE") %}
with
raw_incident as (select * from {{ source("resolutions", "incident") }}),
@ -21,7 +21,7 @@ select
{{ adapter.quote("documents") }} ->> 'VerificationId' as id_verification,
{{ adapter.quote("documents") }} ->> 'CurrentStatusName' as current_status_name,
upper({{ adapter.quote("documents") }} ->> 'CurrentStatusName')
= '{{ insufficient_details_incidents }}' as is_incident_missing_details,
in {{ insufficient_details_incidents }} as is_incident_missing_details,
({{ adapter.quote("documents") }} ->> 'IsSubmissionComplete')::boolean
as is_submission_complete,
{{ adapter.quote("documents") }} ->> 'CurrentAgentName' as current_agent_name,

View file

@ -1,17 +1,11 @@
#!/bin/bash
# === Logging setup ===
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
LOG_FILE="/home/azureuser/dbt_test_logs/dbt_tests_${TIMESTAMP}.log"
exec >> "$LOG_FILE" 2>&1
exec >> /home/azureuser/dbt_tests.log 2>&1
echo "=== dbt test run started at $TIMESTAMP ==="
# === Slack webhook setup ===
# Define the Slack webhook URL
script_dir=$(dirname "$0")
webhooks_file="slack_webhook_urls.txt"
env_file="$script_dir/$webhooks_file"
if [ -f "$env_file" ]; then
export $(grep -v '^#' "$env_file" | xargs)
else
@ -19,79 +13,34 @@ else
exit 1
fi
# Messages to be sent to Slack
slack_failure_message=":rotating_light::rotating_light::rotating_light: One or more failures in dbt tests in production. :rotating_light::rotating_light::rotating_light:"
slack_success_message=":white_check_mark::white_check_mark::white_check_mark: dbt tests executed successfully in production. :white_check_mark::white_check_mark::white_check_mark:"
# Initialize the failure flag
has_any_step_failed=0
# === Navigate to project ===
cd /home/azureuser/data-dwh-dbt-project || exit 1
cd /home/azureuser/data-dwh-dbt-project
# === Update from Git ===
# Update from git
echo "Updating dbt project from git."
git checkout master
git pull
# === Activate virtual environment ===
# Activate venv
source venv/bin/activate
# === Run dbt tests ===
# Run tests
echo "Triggering dbt test"
dbt test
if [ $? -ne 0 ]; then
has_any_step_failed=1
fi
# === Handle success ===
# Check if any step failed and send a Slack message
if [ $has_any_step_failed -eq 1 ]; then
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_failure_message\"}" $SLACK_ALERT_WEBHOOK_URL
fi
if [ $has_any_step_failed -eq 0 ]; then
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$slack_success_message\"}" \
"$SLACK_RECEIPT_WEBHOOK_URL"
exit 0
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_success_message\"}" $SLACK_RECEIPT_WEBHOOK_URL
fi
# === Handle failures: parse log and send individual Slack messages ===
echo "Parsing log file for test failures..."
grep -E "Failure in test|Got [0-9]+ result|compiled code at" "$LOG_FILE" | while read -r line; do
if [[ "$line" =~ Failure\ in\ test\ ([^[:space:]]+)\ \((.*)\) ]]; then
TEST_NAME="${BASH_REMATCH[1]}"
echo "==> Detected failure: $TEST_NAME"
fi
if [[ "$line" =~ Got\ ([0-9]+)\ result ]]; then
FAILED_ROWS="${BASH_REMATCH[1]}"
fi
if [[ "$line" =~ compiled\ code\ at\ (.*) ]]; then
RELATIVE_PATH="${BASH_REMATCH[1]}"
COMPILED_SQL_FILE="/home/azureuser/data-dwh-dbt-project/${RELATIVE_PATH}"
# Check sqlfluff availability
if ! command -v sqlfluff >/dev/null 2>&1; then
echo "ERROR: sqlfluff is not installed or not in PATH"
SQL_QUERY="sqlfluff not found on system"
elif [ -f "$COMPILED_SQL_FILE" ]; then
echo "File exists, attempting to format with sqlfluff..."
FORMATTED_SQL=$(sqlfluff render "$COMPILED_SQL_FILE" --dialect postgres 2>&1)
if [ -n "$FORMATTED_SQL" ]; then
echo "We have formatted SQL"
SQL_QUERY=$(echo "$FORMATTED_SQL" | sed 's/"/\\"/g')
else
echo "sqlfluff returned empty result, falling back to raw file content"
SQL_QUERY=$(<"$COMPILED_SQL_FILE" sed 's/"/\\"/g')
fi
else
echo "ERROR: File not found: $COMPILED_SQL_FILE"
SQL_QUERY="Could not find compiled SQL file: $COMPILED_SQL_FILE"
fi
# === Send Slack message for this failed test ===
echo "Sending message for failed test $TEST_NAME"
SLACK_MESSAGE=":rotating_light: *Test Failure Detected!* :rotating_light:\n\n*Test:* \`$TEST_NAME\`\n*Failed Rows:* $FAILED_ROWS\n*Query:*\n\`\`\`\n$SQL_QUERY\n\`\`\`"
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$SLACK_MESSAGE\"}" \
"$SLACK_ALERT_WEBHOOK_URL"
fi
done

View file

@ -427,3 +427,48 @@ seeds:
Name of the hubspot account owner.
data_tests:
- not_null
- name: stg_seed__guesty_resolutions_snapshot_20250701
description: |
A snapshot of Guesty Resolutions data as of 2025-07-01.
This is a static snapshot and we currently have no intent of maintaining up to date.
The data was shared by Chloe from Resolutions in a static file.
The fields described are those that are used in following models.
columns:
- name: id_booking
data_type: character varying
description: |
The internal ID of this booking in Guesty. Matches with the booking ID
in the Guesty verifications table.
It can contain duplicated bookings, and this is out of our scope.
It cannot be null.
data_tests:
- not_null
- name: claim_date
data_type: character varying
description: |
When was the claim received by Truvi, in format dd/mm/yyyy.
It cannot be null.
data_tests:
- not_null
- name: claim_amount
data_type: character varying
description: |
The amount of the claim in the currency specified in claim_currency.
It's text by default since it might contain data quality issues.
The conversion to decimal is done in dependant models.
It cannot be null.
data_tests:
- not_null
- name: claim_currency
data_type: character varying
description: |
The currency specified in the claim amount.
It cannot be null.
data_tests:
- not_null

File diff suppressed because it is too large Load diff

View file

@ -371,18 +371,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
16,Revenue Churn Rate,2025-01-31,0.0262,0.0231,0.0234
16,Revenue Churn Rate,2025-02-28,0.0189,0.0227,0.0234
16,Revenue Churn Rate,2025-03-31,0.0300,0.0234,0.0234
16,Revenue Churn Rate,2025-04-30,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-05-31,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-06-30,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-07-31,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-08-31,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-09-30,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-10-31,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-11-30,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-12-31,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2026-01-31,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2026-02-28,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2026-03-31,0.0300,0.0300,0.0300
16,Revenue Churn Rate,2025-04-30,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-05-31,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-06-30,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-07-31,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-08-31,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-09-30,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-10-31,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-11-30,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2025-12-31,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2026-01-31,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2026-02-28,0.0100,0.0100,0.0100
16,Revenue Churn Rate,2026-03-31,0.0100,0.0100,0.0100
17,Revenue Churn,2024-04-30,3762,3762,121454
17,Revenue Churn,2024-05-31,3029,6791,121454
17,Revenue Churn,2024-06-30,4583,11374,121454
@ -395,18 +395,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
17,Revenue Churn,2025-01-31,11880,97326,121454
17,Revenue Churn,2025-02-28,8961,106287,121454
17,Revenue Churn,2025-03-31,15167,121454,121454
17,Revenue Churn,2025-04-30,15781,15781,229212
17,Revenue Churn,2025-05-31,16358,32139,229212
17,Revenue Churn,2025-06-30,17005,49144,229212
17,Revenue Churn,2025-07-31,17679,66823,229212
17,Revenue Churn,2025-08-31,18397,85221,229212
17,Revenue Churn,2025-09-30,18998,104219,229212
17,Revenue Churn,2025-10-31,19566,123785,229212
17,Revenue Churn,2025-11-30,20199,143985,229212
17,Revenue Churn,2025-12-31,20761,164746,229212
17,Revenue Churn,2026-01-31,21087,185833,229212
17,Revenue Churn,2026-02-28,21690,207522,229212
17,Revenue Churn,2026-03-31,21690,229212,229212
17,Revenue Churn,2025-04-30,5260,5260,76404
17,Revenue Churn,2025-05-31,5453,10713,76404
17,Revenue Churn,2025-06-30,5668,16381,76404
17,Revenue Churn,2025-07-31,5893,22274,76404
17,Revenue Churn,2025-08-31,6132,28407,76404
17,Revenue Churn,2025-09-30,6333,34740,76404
17,Revenue Churn,2025-10-31,6522,41262,76404
17,Revenue Churn,2025-11-30,6733,47995,76404
17,Revenue Churn,2025-12-31,6920,54915,76404
17,Revenue Churn,2026-01-31,7029,61944,76404
17,Revenue Churn,2026-02-28,7230,69174,76404
17,Revenue Churn,2026-03-31,7230,76404,76404
18,Booking Fee per Billable Booking,2024-04-30,4.410,4.410,3.769
18,Booking Fee per Billable Booking,2024-05-31,4.687,4.553,3.769
18,Booking Fee per Billable Booking,2024-06-30,3.825,4.310,3.769

1 id_metric metric_name target_date target_eom_value target_ytd_value target_eofy_value
371 16 Revenue Churn Rate 2025-01-31 0.0262 0.0231 0.0234
372 16 Revenue Churn Rate 2025-02-28 0.0189 0.0227 0.0234
373 16 Revenue Churn Rate 2025-03-31 0.0300 0.0234 0.0234
374 16 Revenue Churn Rate 2025-04-30 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
375 16 Revenue Churn Rate 2025-05-31 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
376 16 Revenue Churn Rate 2025-06-30 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
377 16 Revenue Churn Rate 2025-07-31 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
378 16 Revenue Churn Rate 2025-08-31 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
379 16 Revenue Churn Rate 2025-09-30 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
380 16 Revenue Churn Rate 2025-10-31 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
381 16 Revenue Churn Rate 2025-11-30 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
382 16 Revenue Churn Rate 2025-12-31 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
383 16 Revenue Churn Rate 2026-01-31 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
384 16 Revenue Churn Rate 2026-02-28 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
385 16 Revenue Churn Rate 2026-03-31 0.0300 0.0100 0.0300 0.0100 0.0300 0.0100
386 17 Revenue Churn 2024-04-30 3762 3762 121454
387 17 Revenue Churn 2024-05-31 3029 6791 121454
388 17 Revenue Churn 2024-06-30 4583 11374 121454
395 17 Revenue Churn 2025-01-31 11880 97326 121454
396 17 Revenue Churn 2025-02-28 8961 106287 121454
397 17 Revenue Churn 2025-03-31 15167 121454 121454
398 17 Revenue Churn 2025-04-30 15781 5260 15781 5260 229212 76404
399 17 Revenue Churn 2025-05-31 16358 5453 32139 10713 229212 76404
400 17 Revenue Churn 2025-06-30 17005 5668 49144 16381 229212 76404
401 17 Revenue Churn 2025-07-31 17679 5893 66823 22274 229212 76404
402 17 Revenue Churn 2025-08-31 18397 6132 85221 28407 229212 76404
403 17 Revenue Churn 2025-09-30 18998 6333 104219 34740 229212 76404
404 17 Revenue Churn 2025-10-31 19566 6522 123785 41262 229212 76404
405 17 Revenue Churn 2025-11-30 20199 6733 143985 47995 229212 76404
406 17 Revenue Churn 2025-12-31 20761 6920 164746 54915 229212 76404
407 17 Revenue Churn 2026-01-31 21087 7029 185833 61944 229212 76404
408 17 Revenue Churn 2026-02-28 21690 7230 207522 69174 229212 76404
409 17 Revenue Churn 2026-03-31 21690 7230 229212 76404 229212 76404
410 18 Booking Fee per Billable Booking 2024-04-30 4.410 4.410 3.769
411 18 Booking Fee per Billable Booking 2024-05-31 4.687 4.553 3.769
412 18 Booking Fee per Billable Booking 2024-06-30 3.825 4.310 3.769