Merged PR 5677: Athena/Guesty high risk clients

# Description

* Adds the new snapshot for Guesty Claims, up to 1st July 2025.
* Creates a model named int_athena__high_risk_client_detector that handles the following logic:

1. The User has been using the agreed services for at least (3) months
2. The aggregated number of claims filed by the User exceeds a total of £2300
3. The User has filed at least (5) claims
4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment

It's heavily opinionated due to lack of clear requirements and lack of data quality, both in athena verifications and guesty claims. Please, check the inline comments for more info.

With these model and conditions, only 2 users would be tagged as high risk.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #31687
This commit is contained in:
Oriol Roqué Paniagua 2025-07-11 10:28:24 +00:00
parent ddc0a6a3f4
commit bc3a364891
4 changed files with 1810 additions and 0 deletions

View file

@ -0,0 +1,101 @@
/*
Dear DWH modeller.
Be aware that this model is heavily opinionated due to many data quality issues, affecting both
Athena Verifications and Guesty Claims.
We will consider a User to be a property manager email.
If a Booking is duplicated at PM email, then we will dedup it.
If a Booking is duplicated among several PM emails, then it will be considered as different Bookings.
If a Booking has several Claims, all of them will be considered, and the claim amount will be aggregated.
Keep in mind that the model uses a snapshot of Guesty Resolutions from 1st of July 2025.
This also means that the conditions for the User to be considered a high-risk client are hardcoded.
*/
with
stg_athena__verifications as (
select
-- Be aware that the same id booking can happen for more than one PM...
property_manager_email,
id_booking,
-- In case of booking duplicates per PM email, just retrieve the first
-- creation
min(created_date_utc) as created_date_utc
from {{ ref("stg_athena__verifications") }}
where id_booking is not null
group by 1, 2
),
stg_seed__guesty_resolutions as (
select
id_booking,
to_date(claim_date, 'DD/MM/YYYY') as claim_date,
case
when claim_amount ~ '^[0-9]+(\.[0-9]+)?$'
then cast(claim_amount as decimal)
else null
end as claim_amount,
claim_currency
from {{ ref("stg_seed__guesty_resolutions_snapshot_20250701") }}
),
int_daily_currency_exchange_rates as (
select * from {{ ref("int_daily_currency_exchange_rates") }}
),
users_3_months_activity as (
-- 1. The User has been using the agreed services for at least (3) months
-- (considered as 1st of July 2025)
select
property_manager_email,
min(created_date_utc) as first_verification_created_per_pm,
count(distinct id_booking) as total_count_of_bookings_per_pm
from stg_athena__verifications sav
group by 1
),
users_with_claims as (
select
u.property_manager_email,
u.first_verification_created_per_pm,
u.total_count_of_bookings_per_pm,
count(r.id_booking) as count_of_claims,
round(sum(r.claim_amount * er.rate), 0) as total_claim_amount_in_gbp,
1.0 * count(r.id_booking) / u.total_count_of_bookings_per_pm as claim_rate
from users_3_months_activity u
inner join
stg_athena__verifications v
on u.property_manager_email = v.property_manager_email
left join stg_seed__guesty_resolutions r on v.id_booking = r.id_booking
left join
int_daily_currency_exchange_rates er
on r.claim_currency = er.from_currency
and er.to_currency = 'GBP'
and r.claim_date = er.rate_date_utc
group by 1, 2, 3
),
rule_logic as (
select
*,
case
when first_verification_created_per_pm < '2025-04-01'
then true
else false
end as has_been_using_services_for_at_least_3_months,
case
when total_claim_amount_in_gbp > 2300 then true else false
end as exceeds_claim_amount_in_gbp,
case
when count_of_claims >= 5 then true else false
end as exceeds_claim_count,
case when claim_rate >= 0.07 then true else false end as exceeds_claim_rate
from users_with_claims
)
select
*,
case
when
has_been_using_services_for_at_least_3_months
and exceeds_claim_amount_in_gbp
and exceeds_claim_count
and exceeds_claim_rate
then true
else false
end as user_exceeds_all_indicators
from rule_logic

View file

@ -259,3 +259,28 @@ models:
description: "Date of checkout for the booking"
data_tests:
- not_null
- name: int_athena__high_risk_client_detector
description: |
This model is used to detect high-risk clients based on their booking and claim history for
Guesty (Athena).
This is based on some business rules that might change in the future.
This is also based on a snapshot that might require updates in the future.
Current rules, based on the Data Request on July 1st 2025 by Chloe from Resolutions, are:
A User is considered a high-risk client if they fall into the below criteria:
1. The User has been using the agreed services for at least (3) months
2. The aggregated number of claims filed by the User exceeds a total of £2300
3. The User has filed at least (5) claims
4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment
columns:
- name: property_manager_email
data_type: character varying
description: |
Email of the property manager.
This is used to identify the property manager for the booking.
It is used to group bookings and claims by property manager.
It is unique and not null.
data_tests:
- not_null
- unique

View file

@ -427,3 +427,48 @@ seeds:
Name of the hubspot account owner.
data_tests:
- not_null
- name: stg_seed__guesty_resolutions_snapshot_20250701
description: |
A snapshot of Guesty Resolutions data as of 2025-07-01.
This is a static snapshot and we currently have no intent of maintaining up to date.
The data was shared by Chloe from Resolutions in a static file.
The fields described are those that are used in following models.
columns:
- name: id_booking
data_type: character varying
description: |
The internal ID of this booking in Guesty. Matches with the booking ID
in the Guesty verifications table.
It can contain duplicated bookings, and this is out of our scope.
It cannot be null.
data_tests:
- not_null
- name: claim_date
data_type: character varying
description: |
When was the claim received by Truvi, in format dd/mm/yyyy.
It cannot be null.
data_tests:
- not_null
- name: claim_amount
data_type: character varying
description: |
The amount of the claim in the currency specified in claim_currency.
It's text by default since it might contain data quality issues.
The conversion to decimal is done in dependant models.
It cannot be null.
data_tests:
- not_null
- name: claim_currency
data_type: character varying
description: |
The currency specified in the claim amount.
It cannot be null.
data_tests:
- not_null

File diff suppressed because it is too large Load diff