Merged PR 5677: Athena/Guesty high risk clients

# Description * Adds the new snapshot for Guesty Claims, up to 1st July 2025. * Creates a model named int_athena__high_risk_client_detector that handles the following logic: 1. The User has been using the agreed services for at least (3) months 2. The aggregated number of claims filed by the User exceeds a total of £2300 3. The User has filed at least (5) claims 4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment It's heavily opinionated due to lack of clear requirements and lack of data quality, both in athena verifications and guesty claims. Please, check the inline comments for more info. With these model and conditions, only 2 users would be tagged as high risk. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31687
Merged PR 5678: Revert 'Prettify alerts in test script'
2025-07-11 10:28:24 +00:00 · 2025-07-11 09:20:01 +00:00 · 2025-07-09 12:31:15 +00:00 · 2025-07-09 11:33:54 +02:00 · 2025-07-08 17:05:45 +02:00 · 2025-07-08 12:45:30 +00:00
19 changed files with 2040 additions and 138 deletions
--- a/README.md
+++ b/README.md
@ -134,6 +134,37 @@ Once you build the docs with `run_docs.sh`, you will have a bunch of files. To o

 This goes beyond the scope of this project: to understand how you can serve these, refer to our [infra script repo](https://guardhog.visualstudio.com/Data/_git/data-infra-script). Specifically, the bits around the web gateway set up.

+## Detecting (and dropping) orphan models in the DWH
+
+If you remove a model from the dbt project, but that model had already been materialized as a table or view in the DWH, the DWH object won't go on its own. You'll have to explictly drop it.
+
+In order to make your life easier, we have a utility script in this repo for this purpose: `find_orphan_models_in_db.sh`.
+
+You can use this script to detect and identify any orphan models. The script can be used one off or be scheduled with slack messaging, so you get automated alerts any time an orphan model appears.
+
+The script is designed to be called from the same machine where you are executing the regular `dbt run` calls. You can try to use it in your local machine, but there are multiple gotchas which might lead to confusion.
+
+To use it:
+- *Note that this assumes you've set up the project in the VM as described in previous sections. If you deviate in naming, paths, etc, you'll probably have to adjust some references here.*
+- In the VM, copy it from the project repo into the home folder: `cp find_orphan_models_in_db.sh ~/find_orphan_models_in_db.sh` and make it executable: `chmod 700 ~/find_orphan_models_in_db.sh`.
+- The script takes two positional arguments: a comma separated list of schemas to review, and a path to dbt's `manifest.json`.
+- Typically, if you call from the VM, you would do: `./find_orphan_models_in_db.sh staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json`.
+- There is an optional `--slack` flag that will send success/failure messages to slack channels. The necessary configuration is the same described in the "How to schedule" section, so if you've already set up the dbt run, test and docs commands, you don't need to take any other steps to start sending slack messages.
+  - Example usage: `./find_orphan_models_in_db.sh --slack staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json `.
+
+
+How to schedule:
+- Simply add a cronjob in the VM with the command:
+
+  ```bash
+  COMMAND="0 9 * * * /bin/bash /home/azureuser/find_orphan_models_in_db.sh --slack staging,intermediate,reporting /home/azureuser/data-dwh-dbt-project/target/manifest.json"
+  (crontab -u $USER -l; echo "$COMMAND" ) | crontab -u $USER -
+  ```
+
+Note some caveats:
+- `sync` models are not checked.
+- If for any reason, you add tables or views that are unrelated to the dbt project in the monitored schemas, these will be identified as orphan by this script. Be careful, you might drop them accidentally if you don't pay attention. The simple solution to this is... don't use dbt schemas for non-dbt purposes.
+
 ## CI

 CI can be setup to review PRs and make the developer experience more solid and less error prone.
--- a/find_orphan_models_in_db.sh
+++ b/find_orphan_models_in_db.sh
@ -0,0 +1,146 @@
+#!/bin/bash
+set -euo pipefail
+
+STARTING_DIR="/home/azureuser"
+cd "$STARTING_DIR"
+
+# === CONFIGURATION ===
+DBT_PROJECT="dwh_dbt"
+DBT_TARGET="prd"
+PROFILE_YML="$STARTING_DIR/.dbt/profiles.yml"
+
+# === Flag defaults ===
+SEND_SLACK=false
+
+# === Parse flags ===
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    -s|--slack)
+      SEND_SLACK=true
+      shift
+      ;;
+    -*)
+      echo "❌ Unknown option: $1"
+      exit 1
+      ;;
+    *)
+      break
+      ;;
+  esac
+done
+
+# === Positional arguments ===
+SCHEMAS="$1"
+MANIFEST_PATH="$2"
+shift 2
+IFS=',' read -r -a SCHEMA_ARRAY <<< "$SCHEMAS"
+
+# === Tool check/install ===
+install_tool_if_missing() {
+  TOOL_CALL_NAME=$1
+  TOOL_APT_NAME=$2
+  if ! command -v "$TOOL_CALL_NAME" &>/dev/null; then
+    echo "🔧 Installing missing tool: $TOOL_APT_NAME"
+    sudo apt-get update -qq
+    sudo apt-get install -y "$TOOL_APT_NAME"
+  else
+    echo "✅ $TOOL_APT_NAME is installed"
+  fi
+}
+
+install_tool_if_missing jq jq
+install_tool_if_missing yq yq
+install_tool_if_missing psql postgresql-client
+
+# === Slack webhook setup ===
+script_dir=$(dirname "$0")
+webhooks_file="slack_webhook_urls.txt"
+env_file="$script_dir/$webhooks_file"
+
+if [ -f "$env_file" ]; then
+  export $(grep -v '^#' "$env_file" | xargs)
+else
+  echo "Error: $webhooks_file file not found in the script directory."
+  exit 1
+fi
+
+# === Load DB credentials from profiles.yml ===
+echo "🔐 Loading DB credentials from $PROFILE_YML..."
+DB_NAME=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.dbname" "$PROFILE_YML")
+DB_USER=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.user" "$PROFILE_YML")
+DB_HOST=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.host" "$PROFILE_YML")
+DB_PORT=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.port" "$PROFILE_YML")
+export PGPASSWORD=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.pass" "$PROFILE_YML")
+
+# === Get list of tables/views from Postgres ===
+echo "🗃️  Reading current tables/views from PostgreSQL..."
+
+POSTGRES_OBJECTS=()
+for SCHEMA in "${SCHEMA_ARRAY[@]}"; do
+  echo "🔎 Scanning schema: $SCHEMA"
+  TABLES=$(psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -Atc "
+    SELECT LOWER(table_schema || '.' || table_name)
+    FROM information_schema.tables
+    WHERE table_schema = '$SCHEMA'
+      AND table_type IN ('BASE TABLE', 'VIEW')
+      AND table_name NOT LIKE 'pg_%'
+    ORDER BY table_schema, table_name;
+  ")
+  while IFS= read -r tbl; do
+    tbl_cleaned=$(echo "$tbl" | tr -d '[:space:]')
+    [[ -n "$tbl_cleaned" ]] && POSTGRES_OBJECTS+=("$tbl_cleaned")
+  done <<< "$TABLES"
+done
+
+POSTGRES_OBJECTS=($(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort -u))
+
+# === Parse manifest.json for dbt model output names ===
+echo "📦 Extracting model output names from dbt manifest..."
+
+DBT_OBJECTS=()
+DBT_ENTRIES=$(jq -r '
+  .nodes | to_entries[] |
+  select(.value.resource_type == "model" or .value.resource_type == "seed") |
+  .value.schema + "." + .value.alias
+' "$MANIFEST_PATH")
+
+while IFS= read -r entry; do
+  entry_cleaned=$(echo "$entry" | tr -d '[:space:]' | tr '[:upper:]' '[:lower:]')
+  [[ -n "$entry_cleaned" ]] && DBT_OBJECTS+=("$entry_cleaned")
+done <<< "$DBT_ENTRIES"
+
+DBT_OBJECTS=($(printf "%s\n" "${DBT_OBJECTS[@]}" | sort -u))
+
+# === Compare ===
+echo "📊 Comparing DBT models vs Postgres state..."
+
+RELEVANT_MODELS=($(comm -12 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
+STALE_MODELS=($(comm -23 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
+
+# === Output ===
+echo ""
+echo "✅ Relevant models (in both DB and DBT):"
+printf "%s\n" "${RELEVANT_MODELS[@]}" | sort
+
+echo ""
+echo "⚠️  Stale models (in DB but NOT in DBT):"
+printf "%s\n" "${STALE_MODELS[@]}" | sort
+
+# === Format stale models for Slack ===
+if [ "$SEND_SLACK" = true ]; then
+    echo "✅ Sending slack message with results."
+    if [ ${#STALE_MODELS[@]} -eq 0 ]; then
+    SLACK_MSG=":white_check_mark::white_check_mark::white_check_mark: dbt models reviewed. No stale models found in the database! :white_check_mark::white_check_mark::white_check_mark:"
+    curl -X POST -H 'Content-type: application/json' \
+        --data "{\"text\":\"$SLACK_MSG\"}" \
+        "$SLACK_RECEIPT_WEBHOOK_URL"
+    else
+    SLACK_MSG=":rotating_light::rotating_light::rotating_light: Stale models detected in Postgres (not in dbt manifest): :rotating_light::rotating_light::rotating_light:\n"
+    for model in "${STALE_MODELS[@]}"; do
+        SLACK_MSG+="- \`$model\`\n"
+    done
+    curl -X POST -H 'Content-type: application/json' \
+        --data "{\"text\":\"$SLACK_MSG\"}" \
+        "$SLACK_ALERT_WEBHOOK_URL"
+    fi
+fi
--- a/models/intermediate/athena/int_athena__high_risk_client_detector.sql
+++ b/models/intermediate/athena/int_athena__high_risk_client_detector.sql
@ -0,0 +1,101 @@
+/*
+ Dear DWH modeller. 
+ Be aware that this model is heavily opinionated due to many data quality issues, affecting both
+ Athena Verifications and Guesty Claims.
+ 
+ We will consider a User to be a property manager email.
+ If a Booking is duplicated at PM email, then we will dedup it.
+ If a Booking is duplicated among several PM emails, then it will be considered as different Bookings.
+ If a Booking has several Claims, all of them will be considered, and the claim amount will be aggregated.
+ 
+ Keep in mind that the model uses a snapshot of Guesty Resolutions from 1st of July 2025.
+ This also means that the conditions for the User to be considered a high-risk client are hardcoded.
+
+ */
+with
+    stg_athena__verifications as (
+        select
+            -- Be aware that the same id booking can happen for more than one PM...
+            property_manager_email,
+            id_booking,
+            -- In case of booking duplicates per PM email, just retrieve the first
+            -- creation
+            min(created_date_utc) as created_date_utc
+        from {{ ref("stg_athena__verifications") }}
+        where id_booking is not null
+        group by 1, 2
+    ),
+    stg_seed__guesty_resolutions as (
+        select
+            id_booking,
+            to_date(claim_date, 'DD/MM/YYYY') as claim_date,
+            case
+                when claim_amount ~ '^[0-9]+(\.[0-9]+)?$'
+                then cast(claim_amount as decimal)
+                else null
+            end as claim_amount,
+            claim_currency
+        from {{ ref("stg_seed__guesty_resolutions_snapshot_20250701") }}
+    ),
+    int_daily_currency_exchange_rates as (
+        select * from {{ ref("int_daily_currency_exchange_rates") }}
+    ),
+    users_3_months_activity as (
+        -- 1. The User has been using the agreed services for at least (3) months
+        -- (considered as 1st of July 2025)
+        select
+            property_manager_email,
+            min(created_date_utc) as first_verification_created_per_pm,
+            count(distinct id_booking) as total_count_of_bookings_per_pm
+        from stg_athena__verifications sav
+        group by 1
+    ),
+    users_with_claims as (
+        select
+            u.property_manager_email,
+            u.first_verification_created_per_pm,
+            u.total_count_of_bookings_per_pm,
+            count(r.id_booking) as count_of_claims,
+            round(sum(r.claim_amount * er.rate), 0) as total_claim_amount_in_gbp,
+            1.0 * count(r.id_booking) / u.total_count_of_bookings_per_pm as claim_rate
+        from users_3_months_activity u
+        inner join
+            stg_athena__verifications v
+            on u.property_manager_email = v.property_manager_email
+        left join stg_seed__guesty_resolutions r on v.id_booking = r.id_booking
+        left join
+            int_daily_currency_exchange_rates er
+            on r.claim_currency = er.from_currency
+            and er.to_currency = 'GBP'
+            and r.claim_date = er.rate_date_utc
+        group by 1, 2, 3
+    ),
+    rule_logic as (
+        select
+            *,
+            case
+                when first_verification_created_per_pm < '2025-04-01'
+                then true
+                else false
+            end as has_been_using_services_for_at_least_3_months,
+            case
+                when total_claim_amount_in_gbp > 2300 then true else false
+            end as exceeds_claim_amount_in_gbp,
+            case
+                when count_of_claims >= 5 then true else false
+            end as exceeds_claim_count,
+            case when claim_rate >= 0.07 then true else false end as exceeds_claim_rate
+        from users_with_claims
+    )
+select
+    *,
+    case
+        when
+            has_been_using_services_for_at_least_3_months
+            and exceeds_claim_amount_in_gbp
+            and exceeds_claim_count
+            and exceeds_claim_rate
+        then true
+        else false
+    end as user_exceeds_all_indicators
+from rule_logic
--- a/models/intermediate/athena/int_athena__verifications_with_fees.sql
+++ b/models/intermediate/athena/int_athena__verifications_with_fees.sql
@ -1,3 +1,5 @@
+{{ config(materialized="table") }}
+
 {% set ok_status = "Approved" %}
 with
    int_athena__verifications as (select * from {{ ref("int_athena__verifications") }}),
--- a/models/intermediate/athena/schema.yml
+++ b/models/intermediate/athena/schema.yml
@ -259,3 +259,28 @@ models:
        description: "Date of checkout for the booking"
        data_tests:
          - not_null
+
+  - name: int_athena__high_risk_client_detector
+    description: |
+      This model is used to detect high-risk clients based on their booking and claim history for
+      Guesty (Athena).
+      This is based on some business rules that might change in the future.
+      This is also based on a snapshot that might require updates in the future.
+
+      Current rules, based on the Data Request on July 1st 2025 by Chloe from Resolutions, are:
+      A User is considered a high-risk client if they fall into the below criteria:
+        1. The User has been using the agreed services for at least (3) months
+        2. The aggregated number of claims filed by the User exceeds a total of £2300
+        3. The User has filed at least (5) claims
+        4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment
+    columns:
+      - name: property_manager_email
+        data_type: character varying
+        description: |
+          Email of the property manager.
+          This is used to identify the property manager for the booking.
+          It is used to group bookings and claims by property manager.
+          It is unique and not null.
+        data_tests:
+          - not_null
+          - unique
--- a/models/intermediate/check_in_hero/int_check_in_hero__checkins.sql
+++ b/models/intermediate/check_in_hero/int_check_in_hero__checkins.sql
@ -1,3 +1,5 @@
+{{ config(materialized="table") }}
+
 with
    stg_check_in_hero__checkins as (
        select * from {{ ref("stg_check_in_hero__checkins") }}
--- a/models/intermediate/core/schema.yml
+++ b/models/intermediate/core/schema.yml
@ -4817,14 +4817,6 @@ models:
      - name: product_name
        data_type: character varying
        description: Type of payment verification, categorizing the transaction.
-        data_tests:
-          - accepted_values:
-              values:
-                - "WAIVER"
-                - "DEPOSIT"
-                - "CHECKINCOVER"
-                - "FEE"
-                - "UNKNOWN"

      - name: is_host_taking_waiver_risk
        data_type: boolean
@ -4875,38 +4867,28 @@ models:
        description: |
          The total amount of the payment in GBP.
          This includes taxes if applicable.
-        data_tests:
-          - not_null

      - name: amount_without_taxes_in_txn_currency
        data_type: numeric
        description: |
          The net amount of the payment without taxes, in local currency.
-        data_tests:
-          - not_null

      - name: amount_without_taxes_in_gbp
        data_type: numeric
        description: |
          The net amount of the payment without taxes, in GBP.
-        data_tests:
-          - not_null

      - name: tax_amount_in_txn_currency
        data_type: numeric
        description: |
          The tax portion of the payment, in local currency.
          Will be 0 if no taxes apply.
-        data_tests:
-          - not_null

      - name: tax_amount_in_gbp
        data_type: numeric
        description: |
          The tax portion of the payment, in GBP. Will be 0 if no 
          taxes apply.
-        data_tests:
-          - not_null

      - name: amount_due_to_host_in_txn_currency
        data_type: numeric
--- a/models/intermediate/cross/int_unified_api_verifications.sql
+++ b/models/intermediate/cross/int_unified_api_verifications.sql
@ -1,3 +1,5 @@
+{{ config(materialized="table") }}
+
 {% set guesty_id_deal = "17814677813" %}
 with
    int_edeposit__verification_fees as (
--- a/models/intermediate/edeposit/int_edeposit__verification_fees.sql
+++ b/models/intermediate/edeposit/int_edeposit__verification_fees.sql
@ -1,3 +1,5 @@
+{{ config(materialized="table") }}
+
 {% set ok_status = ("Approved", "Flagged") %}
 {% set rejected_status = "Rejected" %}
 {% set rejected_fee = 0.25 %}
--- a/models/intermediate/screen_and_protect/int_screen_and_protect__verification_fees.sql
+++ b/models/intermediate/screen_and_protect/int_screen_and_protect__verification_fees.sql
@ -1,3 +1,5 @@
+{{ config(materialized="table") }}
+
 {% set rejected_status = "REJECTED" %}
 {% set approved_flagged_status = ("APPROVED", "FLAGGED") %}
 {% set basic_protection = "BASIC PROTECTION" %}
--- a/models/reporting/core/core__payments.sql
+++ b/models/reporting/core/core__payments.sql
@ -12,6 +12,8 @@ select
        then 'Waiver'
        when product_name = 'DEPOSIT'
        then 'Deposit'
+        when product_name = 'STAYDISRUPT'
+        then 'StayDisrupt'
        when product_name = 'UNKNOWN'
        then null
        else product_name
--- a/models/reporting/core/schema.yml
+++ b/models/reporting/core/schema.yml
@ -1530,6 +1530,7 @@ models:
                - "Waiver"
                - "Deposit"
                - "CheckInCover"
+                - "StayDisrupt"
                - "Fee"

      - name: is_host_taking_waiver_risk
--- a/models/reporting/resolutions/schema.yml
+++ b/models/reporting/resolutions/schema.yml
@ -403,28 +403,15 @@ models:
        data_type: numeric
        description: "Amount of the guest contribution, in case they did,
          in local currency."
-        data_tests:
-          - dbt_expectations.expect_column_values_to_be_between:
-              min_value: 0
-              strictly: false
-              where: not is_incident_missing_details

      - name: guest_contribution_currency
        data_type: text
        description: "Currency of the guest contribution."
-        data_tests:
-          - not_null:
-              where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"

      - name: guest_contribution_amount_in_gbp
        data_type: numeric
        description: "Amount of the guest contribution, in case they did,
          in GBP."
-        data_tests:
-          - dbt_expectations.expect_column_values_to_be_between:
-              min_value: 0
-              strictly: false
-              where: not is_incident_missing_details

      - name: is_guest_contacted_about_damage
        data_type: boolean
--- a/models/staging/resolutions/schema.yml
+++ b/models/staging/resolutions/schema.yml
@ -179,18 +179,10 @@ models:
        data_type: numeric
        description: "Amount of the guest contribution, in case they did,
          in local currency."
-        data_tests:
-          - dbt_expectations.expect_column_values_to_be_between:
-              min_value: 0
-              strictly: false
-              where: not is_incident_missing_details

      - name: guest_contribution_currency
        data_type: text
        description: "Currency of the guest contribution."
-        data_tests:
-          - not_null:
-              where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"

      - name: is_guest_contacted_about_damage
        data_type: boolean
@ -465,19 +457,11 @@ models:
        data_type: numeric
        description: "Claim amount in local currency if the host is seeking
          compensation from another platform."
-        data_tests:
-          - dbt_expectations.expect_column_values_to_be_between:
-              min_value: 0
-              strictly: false
-              where: "not is_incident_missing_details"

      - name: third_party_claim_currency
        data_type: text
        description: "Currency of the claim amount if the host is seeking
          compensation from another platform."
-        data_tests:
-          - not_null:
-              where: "third_party_claim_amount_in_txn_currency > 0 and not is_incident_missing_details"

      - name: cosmos_db_timestamp_utc
        data_type: timestamp
--- a/models/staging/resolutions/stg_resolutions__incidents.sql
+++ b/models/staging/resolutions/stg_resolutions__incidents.sql
@ -8,7 +8,7 @@
 {% set tests_or_cancelled_incidents = "ARCHIVED" %}

 -- Some incidents have insufficient details which might create data quality issues.
-{% set insufficient_details_incidents = "INSUFFICIENT DETAILS" %}
+{% set insufficient_details_incidents = ("INSUFFICIENT DETAILS", "INCOMPLETE") %}

 with
    raw_incident as (select * from {{ source("resolutions", "incident") }}),
@ -21,7 +21,7 @@ select
    {{ adapter.quote("documents") }} ->> 'VerificationId' as id_verification,
    {{ adapter.quote("documents") }} ->> 'CurrentStatusName' as current_status_name,
    upper({{ adapter.quote("documents") }} ->> 'CurrentStatusName')
-    = '{{ insufficient_details_incidents }}' as is_incident_missing_details,
+    in {{ insufficient_details_incidents }} as is_incident_missing_details,
    ({{ adapter.quote("documents") }} ->> 'IsSubmissionComplete')::boolean
    as is_submission_complete,
    {{ adapter.quote("documents") }} ->> 'CurrentAgentName' as current_agent_name,
--- a/run_tests.sh
+++ b/run_tests.sh
@ -1,17 +1,11 @@
 #!/bin/bash

-# === Logging setup ===
-TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
-LOG_FILE="/home/azureuser/dbt_test_logs/dbt_tests_${TIMESTAMP}.log"
-exec >> "$LOG_FILE" 2>&1
+exec >> /home/azureuser/dbt_tests.log 2>&1

-echo "=== dbt test run started at $TIMESTAMP ==="
-
-# === Slack webhook setup ===
+# Define the Slack webhook URL
 script_dir=$(dirname "$0")
 webhooks_file="slack_webhook_urls.txt"
 env_file="$script_dir/$webhooks_file"
-
 if [ -f "$env_file" ]; then
  export $(grep -v '^#' "$env_file" | xargs)
 else
@ -19,79 +13,34 @@ else
  exit 1
 fi

+# Messages to be sent to Slack
 slack_failure_message=":rotating_light::rotating_light::rotating_light: One or more failures in dbt tests in production. :rotating_light::rotating_light::rotating_light:"
 slack_success_message=":white_check_mark::white_check_mark::white_check_mark: dbt tests executed successfully in production. :white_check_mark::white_check_mark::white_check_mark:"

+# Initialize the failure flag
 has_any_step_failed=0

-# === Navigate to project ===
-cd /home/azureuser/data-dwh-dbt-project || exit 1
+cd /home/azureuser/data-dwh-dbt-project

-# === Update from Git ===
+# Update from git
 echo "Updating dbt project from git."
 git checkout master
 git pull

-# === Activate virtual environment ===
+# Activate venv
 source venv/bin/activate

-# === Run dbt tests ===
+# Run tests
 echo "Triggering dbt test"
 dbt test
 if [ $? -ne 0 ]; then
  has_any_step_failed=1
 fi

-# === Handle success ===
+# Check if any step failed and send a Slack message
+if [ $has_any_step_failed -eq 1 ]; then
+  curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_failure_message\"}" $SLACK_ALERT_WEBHOOK_URL
+fi
 if [ $has_any_step_failed -eq 0 ]; then
-  curl -X POST -H 'Content-type: application/json' \
-    --data "{\"text\":\"$slack_success_message\"}" \
-    "$SLACK_RECEIPT_WEBHOOK_URL"
-  exit 0
+  curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_success_message\"}" $SLACK_RECEIPT_WEBHOOK_URL
 fi
-
-# === Handle failures: parse log and send individual Slack messages ===
-echo "Parsing log file for test failures..."
-
-grep -E "Failure in test|Got [0-9]+ result|compiled code at" "$LOG_FILE" | while read -r line; do
-  if [[ "$line" =~ Failure\ in\ test\ ([^[:space:]]+)\ \((.*)\) ]]; then
-    TEST_NAME="${BASH_REMATCH[1]}"
-    echo "==> Detected failure: $TEST_NAME"
-  fi
-
-  if [[ "$line" =~ Got\ ([0-9]+)\ result ]]; then
-    FAILED_ROWS="${BASH_REMATCH[1]}"
-
-  fi
-
-  if [[ "$line" =~ compiled\ code\ at\ (.*) ]]; then
-    RELATIVE_PATH="${BASH_REMATCH[1]}"
-    COMPILED_SQL_FILE="/home/azureuser/data-dwh-dbt-project/${RELATIVE_PATH}"
-    # Check sqlfluff availability
-    if ! command -v sqlfluff >/dev/null 2>&1; then
-      echo "ERROR: sqlfluff is not installed or not in PATH"
-      SQL_QUERY="sqlfluff not found on system"
-    elif [ -f "$COMPILED_SQL_FILE" ]; then
-      echo "File exists, attempting to format with sqlfluff..."
-      FORMATTED_SQL=$(sqlfluff render "$COMPILED_SQL_FILE" --dialect postgres 2>&1)
-      if [ -n "$FORMATTED_SQL" ]; then
-        echo "We have formatted SQL"
-        SQL_QUERY=$(echo "$FORMATTED_SQL" | sed 's/"/\\"/g')
-      else
-        echo "sqlfluff returned empty result, falling back to raw file content"
-        SQL_QUERY=$(<"$COMPILED_SQL_FILE" sed 's/"/\\"/g')
-      fi
-    else
-      echo "ERROR: File not found: $COMPILED_SQL_FILE"
-      SQL_QUERY="Could not find compiled SQL file: $COMPILED_SQL_FILE"
-    fi
-
-    # === Send Slack message for this failed test ===
-    echo "Sending message for failed test $TEST_NAME"
-    SLACK_MESSAGE=":rotating_light: *Test Failure Detected!* :rotating_light:\n\n*Test:* \`$TEST_NAME\`\n*Failed Rows:* $FAILED_ROWS\n*Query:*\n\`\`\`\n$SQL_QUERY\n\`\`\`"
-
-    curl -X POST -H 'Content-type: application/json' \
-      --data "{\"text\":\"$SLACK_MESSAGE\"}" \
-      "$SLACK_ALERT_WEBHOOK_URL"
-  fi
-done
--- a/seeds/schema.yml
+++ b/seeds/schema.yml
@ -427,3 +427,48 @@ seeds:
          Name of the hubspot account owner.
        data_tests:
          - not_null
+
+  - name: stg_seed__guesty_resolutions_snapshot_20250701
+    description: |
+      A snapshot of Guesty Resolutions data as of 2025-07-01.
+      This is a static snapshot and we currently have no intent of maintaining up to date.
+      The data was shared by Chloe from Resolutions in a static file.
+
+      The fields described are those that are used in following models.
+
+    columns:
+      - name: id_booking
+        data_type: character varying
+        description: |
+          The internal ID of this booking in Guesty. Matches with the booking ID
+          in the Guesty verifications table.
+          It can contain duplicated bookings, and this is out of our scope.
+          It cannot be null.
+        data_tests:
+          - not_null
+
+      - name: claim_date
+        data_type: character varying
+        description: |
+          When was the claim received by Truvi, in format dd/mm/yyyy.
+          It cannot be null.
+        data_tests:
+          - not_null
+
+      - name: claim_amount
+        data_type: character varying
+        description: |
+          The amount of the claim in the currency specified in claim_currency.
+          It's text by default since it might contain data quality issues. 
+          The conversion to decimal is done in dependant models.
+          It cannot be null.
+        data_tests:
+          - not_null
+
+      - name: claim_currency
+        data_type: character varying
+        description: |
+          The currency specified in the claim amount.
+          It cannot be null.
+        data_tests:
+          - not_null
--- a/seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
+++ b/seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
--- a/seeds/stg_seed__main_metrics_targets.csv
+++ b/seeds/stg_seed__main_metrics_targets.csv
@ -371,18 +371,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
 16,Revenue Churn Rate,2025-01-31,0.0262,0.0231,0.0234
 16,Revenue Churn Rate,2025-02-28,0.0189,0.0227,0.0234
 16,Revenue Churn Rate,2025-03-31,0.0300,0.0234,0.0234
-16,Revenue Churn Rate,2025-04-30,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-05-31,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-06-30,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-07-31,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-08-31,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-09-30,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-10-31,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-11-30,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2025-12-31,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2026-01-31,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2026-02-28,0.0300,0.0300,0.0300
-16,Revenue Churn Rate,2026-03-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-04-30,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-05-31,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-06-30,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-07-31,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-08-31,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-09-30,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-10-31,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-11-30,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2025-12-31,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2026-01-31,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2026-02-28,0.0100,0.0100,0.0100
+16,Revenue Churn Rate,2026-03-31,0.0100,0.0100,0.0100
 17,Revenue Churn,2024-04-30,3762,3762,121454
 17,Revenue Churn,2024-05-31,3029,6791,121454
 17,Revenue Churn,2024-06-30,4583,11374,121454
@ -395,18 +395,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
 17,Revenue Churn,2025-01-31,11880,97326,121454
 17,Revenue Churn,2025-02-28,8961,106287,121454
 17,Revenue Churn,2025-03-31,15167,121454,121454
-17,Revenue Churn,2025-04-30,15781,15781,229212
-17,Revenue Churn,2025-05-31,16358,32139,229212
-17,Revenue Churn,2025-06-30,17005,49144,229212
-17,Revenue Churn,2025-07-31,17679,66823,229212
-17,Revenue Churn,2025-08-31,18397,85221,229212
-17,Revenue Churn,2025-09-30,18998,104219,229212
-17,Revenue Churn,2025-10-31,19566,123785,229212
-17,Revenue Churn,2025-11-30,20199,143985,229212
-17,Revenue Churn,2025-12-31,20761,164746,229212
-17,Revenue Churn,2026-01-31,21087,185833,229212
-17,Revenue Churn,2026-02-28,21690,207522,229212
-17,Revenue Churn,2026-03-31,21690,229212,229212
+17,Revenue Churn,2025-04-30,5260,5260,76404
+17,Revenue Churn,2025-05-31,5453,10713,76404
+17,Revenue Churn,2025-06-30,5668,16381,76404
+17,Revenue Churn,2025-07-31,5893,22274,76404
+17,Revenue Churn,2025-08-31,6132,28407,76404
+17,Revenue Churn,2025-09-30,6333,34740,76404
+17,Revenue Churn,2025-10-31,6522,41262,76404
+17,Revenue Churn,2025-11-30,6733,47995,76404
+17,Revenue Churn,2025-12-31,6920,54915,76404
+17,Revenue Churn,2026-01-31,7029,61944,76404
+17,Revenue Churn,2026-02-28,7230,69174,76404
+17,Revenue Churn,2026-03-31,7230,76404,76404
 18,Booking Fee per Billable Booking,2024-04-30,4.410,4.410,3.769
 18,Booking Fee per Billable Booking,2024-05-31,4.687,4.553,3.769
 18,Booking Fee per Billable Booking,2024-06-30,3.825,4.310,3.769
Author	SHA1	Message	Date
Oriol Roqué Paniagua	bc3a364891	Merged PR 5677: Athena/Guesty high risk clients # Description * Adds the new snapshot for Guesty Claims, up to 1st July 2025. * Creates a model named int_athena__high_risk_client_detector that handles the following logic: 1. The User has been using the agreed services for at least (3) months 2. The aggregated number of claims filed by the User exceeds a total of £2300 3. The User has filed at least (5) claims 4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment It's heavily opinionated due to lack of clear requirements and lack of data quality, both in athena verifications and guesty claims. Please, check the inline comments for more info. With these model and conditions, only 2 users would be tagged as high risk. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31687	2025-07-11 10:28:24 +00:00
Pablo Martin	ddc0a6a3f4	Merged PR 5678: Revert 'Prettify alerts in test script' # Description We revert this script to its previous step due to it being too fragile in the current implementation and us not having capacity to make it robust enough right now. Reverts !5551 Related work items: #31476	2025-07-11 09:20:01 +00:00
Oriol Roqué Paniagua	2f14b3305c	Merged PR 5652: Remove third party and guest involvements tests # Description Remove third party and guest involvements tests from Resolutions models, after what we discussed with Ant in the channel #resolutions-data This fixes the alerts around resolutions. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31843	2025-07-09 12:31:15 +00:00
Pablo Martin	a1b67d20f1	change materialization of heavy tables	2025-07-09 11:33:54 +02:00
Pablo Martin	7488400cbb	fix bugs in orphan model detection	2025-07-08 17:05:45 +02:00
Pablo Martin	717590513f	Merged PR 5617: Orphan Models Script # Description This PR adds a script to look for orphan models in the DWH. The `README.md` has been expanded to explain how to use and schedule this script.	2025-07-08 12:45:30 +00:00
Oriol Roqué Paniagua	a1429ccec8	Merged PR 5634: Adapt Revenue Churn Rate Targets from 3% to 1% # Description Adapt Revenue Churn Rate Targets from 3% to 1%, also affecting Revenue Churn (to 1/3rd). This should put our targets aligned with those on Finance side. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31351	2025-07-07 12:54:39 +00:00
Oriol Roqué Paniagua	900c73b076	Merged PR 5632: Resolution incidents in status Incomplete now have reduced test coverage # Description Resolution incidents in status Incomplete now have reduced test coverage. This fixes today's data alert. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31843	2025-07-07 12:53:52 +00:00
Pablo Martin	ad67a79a24	script and docs	2025-07-04 12:25:21 +02:00
Joaquin Ossa	e0e97709c0	Merged PR 5607: stay confident inclusion # Description Removed tests in intermediate model and added StayDisrupt as an accepted value for `product_name` # Checklist - [x] The edited models and dependants run properly with production data. - [x] The edited models are sufficiently documented. - [x] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. stay confident inclusion Related work items: #31721	2025-07-02 13:42:45 +00:00
Joaquin	b9fe9a0552	stay confident inclusion	2025-07-02 14:37:44 +02:00