Merged PR 5677: Athena/Guesty high risk clients

# Description * Adds the new snapshot for Guesty Claims, up to 1st July 2025. * Creates a model named int_athena__high_risk_client_detector that handles the following logic: 1. The User has been using the agreed services for at least (3) months 2. The aggregated number of claims filed by the User exceeds a total of £2300 3. The User has filed at least (5) claims 4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment It's heavily opinionated due to lack of clear requirements and lack of data quality, both in athena verifications and guesty claims. Please, check the inline comments for more info. With these model and conditions, only 2 users would be tagged as high risk. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31687
Merged PR 5678: Revert 'Prettify alerts in test script'
2025-07-11 10:28:24 +00:00 · 2025-07-11 09:20:01 +00:00 · 2025-07-09 12:31:15 +00:00 · 2025-07-09 11:33:54 +02:00 · 2025-07-08 17:05:45 +02:00 · 2025-07-08 12:45:30 +00:00
19 changed files with 2040 additions and 138 deletions
--- a/README.md
+++ b/README.md
@ -134,6 +134,37 @@ Once you build the docs with `run_docs.sh`, you will have a bunch of files. To o
 This goes beyond the scope of this project: to understand how you can serve these, refer to our [infra script repo](https://guardhog.visualstudio.com/Data/_git/data-infra-script). Specifically, the bits around the web gateway set up.
 ## Detecting (and dropping) orphan models in the DWH
 If you remove a model from the dbt project, but that model had already been materialized as a table or view in the DWH, the DWH object won't go on its own. You'll have to explictly drop it.
 In order to make your life easier, we have a utility script in this repo for this purpose: `find_orphan_models_in_db.sh`.
 You can use this script to detect and identify any orphan models. The script can be used one off or be scheduled with slack messaging, so you get automated alerts any time an orphan model appears.
 The script is designed to be called from the same machine where you are executing the regular `dbt run` calls. You can try to use it in your local machine, but there are multiple gotchas which might lead to confusion.
 To use it:
 - *Note that this assumes you've set up the project in the VM as described in previous sections. If you deviate in naming, paths, etc, you'll probably have to adjust some references here.*
 - In the VM, copy it from the project repo into the home folder: `cp find_orphan_models_in_db.sh ~/find_orphan_models_in_db.sh` and make it executable: `chmod 700 ~/find_orphan_models_in_db.sh`.
 - The script takes two positional arguments: a comma separated list of schemas to review, and a path to dbt's `manifest.json`.
 - Typically, if you call from the VM, you would do: `./find_orphan_models_in_db.sh staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json`.
 - There is an optional `--slack` flag that will send success/failure messages to slack channels. The necessary configuration is the same described in the "How to schedule" section, so if you've already set up the dbt run, test and docs commands, you don't need to take any other steps to start sending slack messages.
  - Example usage: `./find_orphan_models_in_db.sh --slack staging,intermediate,reporting data-dwh-dbt-project/target/manifest.json `.
 How to schedule:
 - Simply add a cronjob in the VM with the command:
  ```bash
  COMMAND="0 9 * * * /bin/bash /home/azureuser/find_orphan_models_in_db.sh --slack staging,intermediate,reporting /home/azureuser/data-dwh-dbt-project/target/manifest.json"
  (crontab -u $USER -l; echo "$COMMAND" ) | crontab -u $USER -
  ```
 Note some caveats:
 - `sync` models are not checked.
 - If for any reason, you add tables or views that are unrelated to the dbt project in the monitored schemas, these will be identified as orphan by this script. Be careful, you might drop them accidentally if you don't pay attention. The simple solution to this is... don't use dbt schemas for non-dbt purposes.
 ## CI
 CI can be setup to review PRs and make the developer experience more solid and less error prone.
--- a/find_orphan_models_in_db.sh
+++ b/find_orphan_models_in_db.sh
@ -0,0 +1,146 @@
 #!/bin/bash
 set -euo pipefail
 STARTING_DIR="/home/azureuser"
 cd "$STARTING_DIR"
 # === CONFIGURATION ===
 DBT_PROJECT="dwh_dbt"
 DBT_TARGET="prd"
 PROFILE_YML="$STARTING_DIR/.dbt/profiles.yml"
 # === Flag defaults ===
 SEND_SLACK=false
 # === Parse flags ===
 while [[ $# -gt 0 ]]; do
  case "$1" in
    -s|--slack)
      SEND_SLACK=true
      shift
      ;;
    -*)
      echo "❌ Unknown option: $1"
      exit 1
      ;;
    *)
      break
      ;;
  esac
 done
 # === Positional arguments ===
 SCHEMAS="$1"
 MANIFEST_PATH="$2"
 shift 2
 IFS=',' read -r -a SCHEMA_ARRAY <<< "$SCHEMAS"
 # === Tool check/install ===
 install_tool_if_missing() {
  TOOL_CALL_NAME=$1
  TOOL_APT_NAME=$2
  if ! command -v "$TOOL_CALL_NAME" &>/dev/null; then
    echo "🔧 Installing missing tool: $TOOL_APT_NAME"
    sudo apt-get update -qq
    sudo apt-get install -y "$TOOL_APT_NAME"
  else
    echo "✅ $TOOL_APT_NAME is installed"
  fi
 }
 install_tool_if_missing jq jq
 install_tool_if_missing yq yq
 install_tool_if_missing psql postgresql-client
 # === Slack webhook setup ===
 script_dir=$(dirname "$0")
 webhooks_file="slack_webhook_urls.txt"
 env_file="$script_dir/$webhooks_file"
 if [ -f "$env_file" ]; then
  export $(grep -v '^#' "$env_file" | xargs)
 else
  echo "Error: $webhooks_file file not found in the script directory."
  exit 1
 fi
 # === Load DB credentials from profiles.yml ===
 echo "🔐 Loading DB credentials from $PROFILE_YML..."
 DB_NAME=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.dbname" "$PROFILE_YML")
 DB_USER=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.user" "$PROFILE_YML")
 DB_HOST=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.host" "$PROFILE_YML")
 DB_PORT=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.port" "$PROFILE_YML")
 export PGPASSWORD=$(yq e ".${DBT_PROJECT}.outputs.${DBT_TARGET}.pass" "$PROFILE_YML")
 # === Get list of tables/views from Postgres ===
 echo "🗃️  Reading current tables/views from PostgreSQL..."
 POSTGRES_OBJECTS=()
 for SCHEMA in "${SCHEMA_ARRAY[@]}"; do
  echo "🔎 Scanning schema: $SCHEMA"
  TABLES=$(psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -Atc "
    SELECT LOWER(table_schema || '.' || table_name)
    FROM information_schema.tables
    WHERE table_schema = '$SCHEMA'
      AND table_type IN ('BASE TABLE', 'VIEW')
      AND table_name NOT LIKE 'pg_%'
    ORDER BY table_schema, table_name;
  ")
  while IFS= read -r tbl; do
    tbl_cleaned=$(echo "$tbl" | tr -d '[:space:]')
    [[ -n "$tbl_cleaned" ]] && POSTGRES_OBJECTS+=("$tbl_cleaned")
  done <<< "$TABLES"
 done
 POSTGRES_OBJECTS=($(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort -u))
 # === Parse manifest.json for dbt model output names ===
 echo "📦 Extracting model output names from dbt manifest..."
 DBT_OBJECTS=()
 DBT_ENTRIES=$(jq -r '
  .nodes | to_entries[] |
  select(.value.resource_type == "model" or .value.resource_type == "seed") |
  .value.schema + "." + .value.alias
 ' "$MANIFEST_PATH")
 while IFS= read -r entry; do
  entry_cleaned=$(echo "$entry" | tr -d '[:space:]' | tr '[:upper:]' '[:lower:]')
  [[ -n "$entry_cleaned" ]] && DBT_OBJECTS+=("$entry_cleaned")
 done <<< "$DBT_ENTRIES"
 DBT_OBJECTS=($(printf "%s\n" "${DBT_OBJECTS[@]}" | sort -u))
 # === Compare ===
 echo "📊 Comparing DBT models vs Postgres state..."
 RELEVANT_MODELS=($(comm -12 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
 STALE_MODELS=($(comm -23 <(printf "%s\n" "${POSTGRES_OBJECTS[@]}" | sort) <(printf "%s\n" "${DBT_OBJECTS[@]}" | sort)))
 # === Output ===
 echo ""
 echo "✅ Relevant models (in both DB and DBT):"
 printf "%s\n" "${RELEVANT_MODELS[@]}" | sort
 echo ""
 echo "⚠️  Stale models (in DB but NOT in DBT):"
 printf "%s\n" "${STALE_MODELS[@]}" | sort
 # === Format stale models for Slack ===
 if [ "$SEND_SLACK" = true ]; then
    echo "✅ Sending slack message with results."
    if [ ${#STALE_MODELS[@]} -eq 0 ]; then
    SLACK_MSG=":white_check_mark::white_check_mark::white_check_mark: dbt models reviewed. No stale models found in the database! :white_check_mark::white_check_mark::white_check_mark:"
    curl -X POST -H 'Content-type: application/json' \
        --data "{\"text\":\"$SLACK_MSG\"}" \
        "$SLACK_RECEIPT_WEBHOOK_URL"
    else
    SLACK_MSG=":rotating_light::rotating_light::rotating_light: Stale models detected in Postgres (not in dbt manifest): :rotating_light::rotating_light::rotating_light:\n"
    for model in "${STALE_MODELS[@]}"; do
        SLACK_MSG+="- \`$model\`\n"
    done
    curl -X POST -H 'Content-type: application/json' \
        --data "{\"text\":\"$SLACK_MSG\"}" \
        "$SLACK_ALERT_WEBHOOK_URL"
    fi
 fi
--- a/models/intermediate/athena/int_athena__high_risk_client_detector.sql
+++ b/models/intermediate/athena/int_athena__high_risk_client_detector.sql
@ -0,0 +1,101 @@
 /*
 Dear DWH modeller. 
 Be aware that this model is heavily opinionated due to many data quality issues, affecting both
 Athena Verifications and Guesty Claims.
 We will consider a User to be a property manager email.
 If a Booking is duplicated at PM email, then we will dedup it.
 If a Booking is duplicated among several PM emails, then it will be considered as different Bookings.
 If a Booking has several Claims, all of them will be considered, and the claim amount will be aggregated.
 Keep in mind that the model uses a snapshot of Guesty Resolutions from 1st of July 2025.
 This also means that the conditions for the User to be considered a high-risk client are hardcoded.
 */
 with
    stg_athena__verifications as (
        select
            -- Be aware that the same id booking can happen for more than one PM...
            property_manager_email,
            id_booking,
            -- In case of booking duplicates per PM email, just retrieve the first
            -- creation
            min(created_date_utc) as created_date_utc
        from {{ ref("stg_athena__verifications") }}
        where id_booking is not null
        group by 1, 2
    ),
    stg_seed__guesty_resolutions as (
        select
            id_booking,
            to_date(claim_date, 'DD/MM/YYYY') as claim_date,
            case
                when claim_amount ~ '^[0-9]+(\.[0-9]+)?$'
                then cast(claim_amount as decimal)
                else null
            end as claim_amount,
            claim_currency
        from {{ ref("stg_seed__guesty_resolutions_snapshot_20250701") }}
    ),
    int_daily_currency_exchange_rates as (
        select * from {{ ref("int_daily_currency_exchange_rates") }}
    ),
    users_3_months_activity as (
        -- 1. The User has been using the agreed services for at least (3) months
        -- (considered as 1st of July 2025)
        select
            property_manager_email,
            min(created_date_utc) as first_verification_created_per_pm,
            count(distinct id_booking) as total_count_of_bookings_per_pm
        from stg_athena__verifications sav
        group by 1
    ),
    users_with_claims as (
        select
            u.property_manager_email,
            u.first_verification_created_per_pm,
            u.total_count_of_bookings_per_pm,
            count(r.id_booking) as count_of_claims,
            round(sum(r.claim_amount * er.rate), 0) as total_claim_amount_in_gbp,
            1.0 * count(r.id_booking) / u.total_count_of_bookings_per_pm as claim_rate
        from users_3_months_activity u
        inner join
            stg_athena__verifications v
            on u.property_manager_email = v.property_manager_email
        left join stg_seed__guesty_resolutions r on v.id_booking = r.id_booking
        left join
            int_daily_currency_exchange_rates er
            on r.claim_currency = er.from_currency
            and er.to_currency = 'GBP'
            and r.claim_date = er.rate_date_utc
        group by 1, 2, 3
    ),
    rule_logic as (
        select
            *,
            case
                when first_verification_created_per_pm < '2025-04-01'
                then true
                else false
            end as has_been_using_services_for_at_least_3_months,
            case
                when total_claim_amount_in_gbp > 2300 then true else false
            end as exceeds_claim_amount_in_gbp,
            case
                when count_of_claims >= 5 then true else false
            end as exceeds_claim_count,
            case when claim_rate >= 0.07 then true else false end as exceeds_claim_rate
        from users_with_claims
    )
 select
    *,
    case
        when
            has_been_using_services_for_at_least_3_months
            and exceeds_claim_amount_in_gbp
            and exceeds_claim_count
            and exceeds_claim_rate
        then true
        else false
    end as user_exceeds_all_indicators
 from rule_logic
--- a/models/intermediate/athena/int_athena__verifications_with_fees.sql
+++ b/models/intermediate/athena/int_athena__verifications_with_fees.sql
@ -1,3 +1,5 @@
 {{ config(materialized="table") }}
 {% set ok_status = "Approved" %}
 with
    int_athena__verifications as (select * from {{ ref("int_athena__verifications") }}),
--- a/models/intermediate/athena/schema.yml
+++ b/models/intermediate/athena/schema.yml
@ -259,3 +259,28 @@ models:
        description: "Date of checkout for the booking"
        data_tests:
          - not_null
  - name: int_athena__high_risk_client_detector
    description: |
      This model is used to detect high-risk clients based on their booking and claim history for
      Guesty (Athena).
      This is based on some business rules that might change in the future.
      This is also based on a snapshot that might require updates in the future.
      Current rules, based on the Data Request on July 1st 2025 by Chloe from Resolutions, are:
      A User is considered a high-risk client if they fall into the below criteria:
        1. The User has been using the agreed services for at least (3) months
        2. The aggregated number of claims filed by the User exceeds a total of £2300
        3. The User has filed at least (5) claims
        4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment
    columns:
      - name: property_manager_email
        data_type: character varying
        description: |
          Email of the property manager.
          This is used to identify the property manager for the booking.
          It is used to group bookings and claims by property manager.
          It is unique and not null.
        data_tests:
          - not_null
          - unique
--- a/models/intermediate/check_in_hero/int_check_in_hero__checkins.sql
+++ b/models/intermediate/check_in_hero/int_check_in_hero__checkins.sql
@ -1,3 +1,5 @@
 {{ config(materialized="table") }}
 with
    stg_check_in_hero__checkins as (
        select * from {{ ref("stg_check_in_hero__checkins") }}
--- a/models/intermediate/core/schema.yml
+++ b/models/intermediate/core/schema.yml
@ -4817,14 +4817,6 @@ models:
      - name: product_name
        data_type: character varying
        description: Type of payment verification, categorizing the transaction.
        data_tests:
          - accepted_values:
              values:
                - "WAIVER"
                - "DEPOSIT"
                - "CHECKINCOVER"
                - "FEE"
                - "UNKNOWN"
      - name: is_host_taking_waiver_risk
        data_type: boolean
@ -4875,38 +4867,28 @@ models:
        description: |
          The total amount of the payment in GBP.
          This includes taxes if applicable.
        data_tests:
          - not_null
      - name: amount_without_taxes_in_txn_currency
        data_type: numeric
        description: |
          The net amount of the payment without taxes, in local currency.
        data_tests:
          - not_null
      - name: amount_without_taxes_in_gbp
        data_type: numeric
        description: |
          The net amount of the payment without taxes, in GBP.
        data_tests:
          - not_null
      - name: tax_amount_in_txn_currency
        data_type: numeric
        description: |
          The tax portion of the payment, in local currency.
          Will be 0 if no taxes apply.
        data_tests:
          - not_null
      - name: tax_amount_in_gbp
        data_type: numeric
        description: |
          The tax portion of the payment, in GBP. Will be 0 if no 
          taxes apply.
        data_tests:
          - not_null
      - name: amount_due_to_host_in_txn_currency
        data_type: numeric
--- a/models/intermediate/cross/int_unified_api_verifications.sql
+++ b/models/intermediate/cross/int_unified_api_verifications.sql
@ -1,3 +1,5 @@
 {{ config(materialized="table") }}
 {% set guesty_id_deal = "17814677813" %}
 with
    int_edeposit__verification_fees as (
--- a/models/intermediate/edeposit/int_edeposit__verification_fees.sql
+++ b/models/intermediate/edeposit/int_edeposit__verification_fees.sql
@ -1,3 +1,5 @@
 {{ config(materialized="table") }}
 {% set ok_status = ("Approved", "Flagged") %}
 {% set rejected_status = "Rejected" %}
 {% set rejected_fee = 0.25 %}
--- a/models/intermediate/screen_and_protect/int_screen_and_protect__verification_fees.sql
+++ b/models/intermediate/screen_and_protect/int_screen_and_protect__verification_fees.sql
@ -1,3 +1,5 @@
 {{ config(materialized="table") }}
 {% set rejected_status = "REJECTED" %}
 {% set approved_flagged_status = ("APPROVED", "FLAGGED") %}
 {% set basic_protection = "BASIC PROTECTION" %}
--- a/models/reporting/core/core__payments.sql
+++ b/models/reporting/core/core__payments.sql
@ -12,6 +12,8 @@ select
        then 'Waiver'
        when product_name = 'DEPOSIT'
        then 'Deposit'
        when product_name = 'STAYDISRUPT'
        then 'StayDisrupt'
        when product_name = 'UNKNOWN'
        then null
        else product_name
--- a/models/reporting/core/schema.yml
+++ b/models/reporting/core/schema.yml
@ -1530,6 +1530,7 @@ models:
                - "Waiver"
                - "Deposit"
                - "CheckInCover"
                - "StayDisrupt"
                - "Fee"
      - name: is_host_taking_waiver_risk
--- a/models/reporting/resolutions/schema.yml
+++ b/models/reporting/resolutions/schema.yml
@ -403,28 +403,15 @@ models:
        data_type: numeric
        description: "Amount of the guest contribution, in case they did,
          in local currency."
        data_tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              strictly: false
              where: not is_incident_missing_details
      - name: guest_contribution_currency
        data_type: text
        description: "Currency of the guest contribution."
        data_tests:
          - not_null:
              where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"
      - name: guest_contribution_amount_in_gbp
        data_type: numeric
        description: "Amount of the guest contribution, in case they did,
          in GBP."
        data_tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              strictly: false
              where: not is_incident_missing_details
      - name: is_guest_contacted_about_damage
        data_type: boolean
--- a/models/staging/resolutions/schema.yml
+++ b/models/staging/resolutions/schema.yml
@ -179,18 +179,10 @@ models:
        data_type: numeric
        description: "Amount of the guest contribution, in case they did,
          in local currency."
        data_tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              strictly: false
              where: not is_incident_missing_details
      - name: guest_contribution_currency
        data_type: text
        description: "Currency of the guest contribution."
        data_tests:
          - not_null:
              where: "guest_contribution_amount_in_txn_currency > 0 and not is_incident_missing_details"
      - name: is_guest_contacted_about_damage
        data_type: boolean
@ -465,19 +457,11 @@ models:
        data_type: numeric
        description: "Claim amount in local currency if the host is seeking
          compensation from another platform."
        data_tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              strictly: false
              where: "not is_incident_missing_details"
      - name: third_party_claim_currency
        data_type: text
        description: "Currency of the claim amount if the host is seeking
          compensation from another platform."
        data_tests:
          - not_null:
              where: "third_party_claim_amount_in_txn_currency > 0 and not is_incident_missing_details"
      - name: cosmos_db_timestamp_utc
        data_type: timestamp
--- a/models/staging/resolutions/stg_resolutions__incidents.sql
+++ b/models/staging/resolutions/stg_resolutions__incidents.sql
@ -8,7 +8,7 @@
 {% set tests_or_cancelled_incidents = "ARCHIVED" %}
 -- Some incidents have insufficient details which might create data quality issues.
-{% set insufficient_details_incidents = "INSUFFICIENT DETAILS" %}
+{% set insufficient_details_incidents = ("INSUFFICIENT DETAILS", "INCOMPLETE") %}
 with
    raw_incident as (select * from {{ source("resolutions", "incident") }}),
@ -21,7 +21,7 @@ select
    {{ adapter.quote("documents") }} ->> 'VerificationId' as id_verification,
    {{ adapter.quote("documents") }} ->> 'CurrentStatusName' as current_status_name,
    upper({{ adapter.quote("documents") }} ->> 'CurrentStatusName')
-    = '{{ insufficient_details_incidents }}' as is_incident_missing_details,
+    in {{ insufficient_details_incidents }} as is_incident_missing_details,
    ({{ adapter.quote("documents") }} ->> 'IsSubmissionComplete')::boolean
    as is_submission_complete,
    {{ adapter.quote("documents") }} ->> 'CurrentAgentName' as current_agent_name,
--- a/run_tests.sh
+++ b/run_tests.sh
@ -1,17 +1,11 @@
 #!/bin/bash
-# === Logging setup ===
+exec >> /home/azureuser/dbt_tests.log 2>&1
 TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
 LOG_FILE="/home/azureuser/dbt_test_logs/dbt_tests_${TIMESTAMP}.log"
 exec >> "$LOG_FILE" 2>&1
-echo "=== dbt test run started at $TIMESTAMP ==="
+# Define the Slack webhook URL
 # === Slack webhook setup ===
 script_dir=$(dirname "$0")
 webhooks_file="slack_webhook_urls.txt"
 env_file="$script_dir/$webhooks_file"
 if [ -f "$env_file" ]; then
  export $(grep -v '^#' "$env_file" | xargs)
 else
@ -19,79 +13,34 @@ else
  exit 1
 fi
 # Messages to be sent to Slack
 slack_failure_message=":rotating_light::rotating_light::rotating_light: One or more failures in dbt tests in production. :rotating_light::rotating_light::rotating_light:"
 slack_success_message=":white_check_mark::white_check_mark::white_check_mark: dbt tests executed successfully in production. :white_check_mark::white_check_mark::white_check_mark:"
 # Initialize the failure flag
 has_any_step_failed=0
-# === Navigate to project ===
+cd /home/azureuser/data-dwh-dbt-project
 cd /home/azureuser/data-dwh-dbt-project || exit 1
-# === Update from Git ===
+# Update from git
 echo "Updating dbt project from git."
 git checkout master
 git pull
-# === Activate virtual environment ===
+# Activate venv
 source venv/bin/activate
-# === Run dbt tests ===
+# Run tests
 echo "Triggering dbt test"
 dbt test
 if [ $? -ne 0 ]; then
  has_any_step_failed=1
 fi
-# === Handle success ===
+# Check if any step failed and send a Slack message
 if [ $has_any_step_failed -eq 1 ]; then
  curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_failure_message\"}" $SLACK_ALERT_WEBHOOK_URL
 fi
 if [ $has_any_step_failed -eq 0 ]; then
-  curl -X POST -H 'Content-type: application/json' \
+  curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$slack_success_message\"}" $SLACK_RECEIPT_WEBHOOK_URL
    --data "{\"text\":\"$slack_success_message\"}" \
    "$SLACK_RECEIPT_WEBHOOK_URL"
  exit 0
 fi
 # === Handle failures: parse log and send individual Slack messages ===
 echo "Parsing log file for test failures..."
 grep -E "Failure in test|Got [0-9]+ result|compiled code at" "$LOG_FILE" | while read -r line; do
  if [[ "$line" =~ Failure\ in\ test\ ([^[:space:]]+)\ \((.*)\) ]]; then
    TEST_NAME="${BASH_REMATCH[1]}"
    echo "==> Detected failure: $TEST_NAME"
  fi
  if [[ "$line" =~ Got\ ([0-9]+)\ result ]]; then
    FAILED_ROWS="${BASH_REMATCH[1]}"
  fi
  if [[ "$line" =~ compiled\ code\ at\ (.*) ]]; then
    RELATIVE_PATH="${BASH_REMATCH[1]}"
    COMPILED_SQL_FILE="/home/azureuser/data-dwh-dbt-project/${RELATIVE_PATH}"
    # Check sqlfluff availability
    if ! command -v sqlfluff >/dev/null 2>&1; then
      echo "ERROR: sqlfluff is not installed or not in PATH"
      SQL_QUERY="sqlfluff not found on system"
    elif [ -f "$COMPILED_SQL_FILE" ]; then
      echo "File exists, attempting to format with sqlfluff..."
      FORMATTED_SQL=$(sqlfluff render "$COMPILED_SQL_FILE" --dialect postgres 2>&1)
      if [ -n "$FORMATTED_SQL" ]; then
        echo "We have formatted SQL"
        SQL_QUERY=$(echo "$FORMATTED_SQL" | sed 's/"/\\"/g')
      else
        echo "sqlfluff returned empty result, falling back to raw file content"
        SQL_QUERY=$(<"$COMPILED_SQL_FILE" sed 's/"/\\"/g')
      fi
    else
      echo "ERROR: File not found: $COMPILED_SQL_FILE"
      SQL_QUERY="Could not find compiled SQL file: $COMPILED_SQL_FILE"
    fi
    # === Send Slack message for this failed test ===
    echo "Sending message for failed test $TEST_NAME"
    SLACK_MESSAGE=":rotating_light: *Test Failure Detected!* :rotating_light:\n\n*Test:* \`$TEST_NAME\`\n*Failed Rows:* $FAILED_ROWS\n*Query:*\n\`\`\`\n$SQL_QUERY\n\`\`\`"
    curl -X POST -H 'Content-type: application/json' \
      --data "{\"text\":\"$SLACK_MESSAGE\"}" \
      "$SLACK_ALERT_WEBHOOK_URL"
  fi
 done
--- a/seeds/schema.yml
+++ b/seeds/schema.yml
@ -427,3 +427,48 @@ seeds:
          Name of the hubspot account owner.
        data_tests:
          - not_null
  - name: stg_seed__guesty_resolutions_snapshot_20250701
    description: |
      A snapshot of Guesty Resolutions data as of 2025-07-01.
      This is a static snapshot and we currently have no intent of maintaining up to date.
      The data was shared by Chloe from Resolutions in a static file.
      The fields described are those that are used in following models.
    columns:
      - name: id_booking
        data_type: character varying
        description: |
          The internal ID of this booking in Guesty. Matches with the booking ID
          in the Guesty verifications table.
          It can contain duplicated bookings, and this is out of our scope.
          It cannot be null.
        data_tests:
          - not_null
      - name: claim_date
        data_type: character varying
        description: |
          When was the claim received by Truvi, in format dd/mm/yyyy.
          It cannot be null.
        data_tests:
          - not_null
      - name: claim_amount
        data_type: character varying
        description: |
          The amount of the claim in the currency specified in claim_currency.
          It's text by default since it might contain data quality issues. 
          The conversion to decimal is done in dependant models.
          It cannot be null.
        data_tests:
          - not_null
      - name: claim_currency
        data_type: character varying
        description: |
          The currency specified in the claim amount.
          It cannot be null.
        data_tests:
          - not_null
--- a/seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
+++ b/seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
--- a/seeds/stg_seed__main_metrics_targets.csv
+++ b/seeds/stg_seed__main_metrics_targets.csv
@ -371,18 +371,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
 16,Revenue Churn Rate,2025-01-31,0.0262,0.0231,0.0234
 16,Revenue Churn Rate,2025-02-28,0.0189,0.0227,0.0234
 16,Revenue Churn Rate,2025-03-31,0.0300,0.0234,0.0234
-16,Revenue Churn Rate,2025-04-30,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-04-30,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-05-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-05-31,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-06-30,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-06-30,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-07-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-07-31,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-08-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-08-31,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-09-30,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-09-30,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-10-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-10-31,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-11-30,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-11-30,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2025-12-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2025-12-31,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2026-01-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2026-01-31,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2026-02-28,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2026-02-28,0.0100,0.0100,0.0100
-16,Revenue Churn Rate,2026-03-31,0.0300,0.0300,0.0300
+16,Revenue Churn Rate,2026-03-31,0.0100,0.0100,0.0100
 17,Revenue Churn,2024-04-30,3762,3762,121454
 17,Revenue Churn,2024-05-31,3029,6791,121454
 17,Revenue Churn,2024-06-30,4583,11374,121454
@ -395,18 +395,18 @@ id_metric,metric_name,target_date,target_eom_value,target_ytd_value,target_eofy_
 17,Revenue Churn,2025-01-31,11880,97326,121454
 17,Revenue Churn,2025-02-28,8961,106287,121454
 17,Revenue Churn,2025-03-31,15167,121454,121454
-17,Revenue Churn,2025-04-30,15781,15781,229212
+17,Revenue Churn,2025-04-30,5260,5260,76404
-17,Revenue Churn,2025-05-31,16358,32139,229212
+17,Revenue Churn,2025-05-31,5453,10713,76404
-17,Revenue Churn,2025-06-30,17005,49144,229212
+17,Revenue Churn,2025-06-30,5668,16381,76404
-17,Revenue Churn,2025-07-31,17679,66823,229212
+17,Revenue Churn,2025-07-31,5893,22274,76404
-17,Revenue Churn,2025-08-31,18397,85221,229212
+17,Revenue Churn,2025-08-31,6132,28407,76404
-17,Revenue Churn,2025-09-30,18998,104219,229212
+17,Revenue Churn,2025-09-30,6333,34740,76404
-17,Revenue Churn,2025-10-31,19566,123785,229212
+17,Revenue Churn,2025-10-31,6522,41262,76404
-17,Revenue Churn,2025-11-30,20199,143985,229212
+17,Revenue Churn,2025-11-30,6733,47995,76404
-17,Revenue Churn,2025-12-31,20761,164746,229212
+17,Revenue Churn,2025-12-31,6920,54915,76404
-17,Revenue Churn,2026-01-31,21087,185833,229212
+17,Revenue Churn,2026-01-31,7029,61944,76404
-17,Revenue Churn,2026-02-28,21690,207522,229212
+17,Revenue Churn,2026-02-28,7230,69174,76404
-17,Revenue Churn,2026-03-31,21690,229212,229212
+17,Revenue Churn,2026-03-31,7230,76404,76404
 18,Booking Fee per Billable Booking,2024-04-30,4.410,4.410,3.769
 18,Booking Fee per Billable Booking,2024-05-31,4.687,4.553,3.769
 18,Booking Fee per Billable Booking,2024-06-30,3.825,4.310,3.769
Author	SHA1	Message	Date
Oriol Roqué Paniagua	bc3a364891	Merged PR 5677: Athena/Guesty high risk clients # Description * Adds the new snapshot for Guesty Claims, up to 1st July 2025. * Creates a model named int_athena__high_risk_client_detector that handles the following logic: 1. The User has been using the agreed services for at least (3) months 2. The aggregated number of claims filed by the User exceeds a total of £2300 3. The User has filed at least (5) claims 4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment It's heavily opinionated due to lack of clear requirements and lack of data quality, both in athena verifications and guesty claims. Please, check the inline comments for more info. With these model and conditions, only 2 users would be tagged as high risk. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31687	2025-07-11 10:28:24 +00:00
Pablo Martin	ddc0a6a3f4	Merged PR 5678: Revert 'Prettify alerts in test script' # Description We revert this script to its previous step due to it being too fragile in the current implementation and us not having capacity to make it robust enough right now. Reverts !5551 Related work items: #31476	2025-07-11 09:20:01 +00:00
Oriol Roqué Paniagua	2f14b3305c	Merged PR 5652: Remove third party and guest involvements tests # Description Remove third party and guest involvements tests from Resolutions models, after what we discussed with Ant in the channel #resolutions-data This fixes the alerts around resolutions. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31843	2025-07-09 12:31:15 +00:00
Pablo Martin	a1b67d20f1	change materialization of heavy tables	2025-07-09 11:33:54 +02:00
Pablo Martin	7488400cbb	fix bugs in orphan model detection	2025-07-08 17:05:45 +02:00
Pablo Martin	717590513f	Merged PR 5617: Orphan Models Script # Description This PR adds a script to look for orphan models in the DWH. The `README.md` has been expanded to explain how to use and schedule this script.	2025-07-08 12:45:30 +00:00
Oriol Roqué Paniagua	a1429ccec8	Merged PR 5634: Adapt Revenue Churn Rate Targets from 3% to 1% # Description Adapt Revenue Churn Rate Targets from 3% to 1%, also affecting Revenue Churn (to 1/3rd). This should put our targets aligned with those on Finance side. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31351	2025-07-07 12:54:39 +00:00
Oriol Roqué Paniagua	900c73b076	Merged PR 5632: Resolution incidents in status Incomplete now have reduced test coverage # Description Resolution incidents in status Incomplete now have reduced test coverage. This fixes today's data alert. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31843	2025-07-07 12:53:52 +00:00
Pablo Martin	ad67a79a24	script and docs	2025-07-04 12:25:21 +02:00
Joaquin Ossa	e0e97709c0	Merged PR 5607: stay confident inclusion # Description Removed tests in intermediate model and added StayDisrupt as an accepted value for `product_name` # Checklist - [x] The edited models and dependants run properly with production data. - [x] The edited models are sufficiently documented. - [x] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. stay confident inclusion Related work items: #31721	2025-07-02 13:42:45 +00:00
Joaquin	b9fe9a0552	stay confident inclusion	2025-07-02 14:37:44 +02:00