Merged PR 5677: Athena/Guesty high risk clients
# Description * Adds the new snapshot for Guesty Claims, up to 1st July 2025. * Creates a model named int_athena__high_risk_client_detector that handles the following logic: 1. The User has been using the agreed services for at least (3) months 2. The aggregated number of claims filed by the User exceeds a total of £2300 3. The User has filed at least (5) claims 4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment It's heavily opinionated due to lack of clear requirements and lack of data quality, both in athena verifications and guesty claims. Please, check the inline comments for more info. With these model and conditions, only 2 users would be tagged as high risk. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #31687
This commit is contained in:
parent
ddc0a6a3f4
commit
bc3a364891
4 changed files with 1810 additions and 0 deletions
|
|
@ -0,0 +1,101 @@
|
||||||
|
/*
|
||||||
|
Dear DWH modeller.
|
||||||
|
Be aware that this model is heavily opinionated due to many data quality issues, affecting both
|
||||||
|
Athena Verifications and Guesty Claims.
|
||||||
|
|
||||||
|
We will consider a User to be a property manager email.
|
||||||
|
If a Booking is duplicated at PM email, then we will dedup it.
|
||||||
|
If a Booking is duplicated among several PM emails, then it will be considered as different Bookings.
|
||||||
|
If a Booking has several Claims, all of them will be considered, and the claim amount will be aggregated.
|
||||||
|
|
||||||
|
Keep in mind that the model uses a snapshot of Guesty Resolutions from 1st of July 2025.
|
||||||
|
This also means that the conditions for the User to be considered a high-risk client are hardcoded.
|
||||||
|
|
||||||
|
*/
|
||||||
|
with
|
||||||
|
stg_athena__verifications as (
|
||||||
|
select
|
||||||
|
-- Be aware that the same id booking can happen for more than one PM...
|
||||||
|
property_manager_email,
|
||||||
|
id_booking,
|
||||||
|
-- In case of booking duplicates per PM email, just retrieve the first
|
||||||
|
-- creation
|
||||||
|
min(created_date_utc) as created_date_utc
|
||||||
|
from {{ ref("stg_athena__verifications") }}
|
||||||
|
where id_booking is not null
|
||||||
|
group by 1, 2
|
||||||
|
),
|
||||||
|
stg_seed__guesty_resolutions as (
|
||||||
|
select
|
||||||
|
id_booking,
|
||||||
|
to_date(claim_date, 'DD/MM/YYYY') as claim_date,
|
||||||
|
case
|
||||||
|
when claim_amount ~ '^[0-9]+(\.[0-9]+)?$'
|
||||||
|
then cast(claim_amount as decimal)
|
||||||
|
else null
|
||||||
|
end as claim_amount,
|
||||||
|
claim_currency
|
||||||
|
from {{ ref("stg_seed__guesty_resolutions_snapshot_20250701") }}
|
||||||
|
),
|
||||||
|
int_daily_currency_exchange_rates as (
|
||||||
|
select * from {{ ref("int_daily_currency_exchange_rates") }}
|
||||||
|
),
|
||||||
|
users_3_months_activity as (
|
||||||
|
-- 1. The User has been using the agreed services for at least (3) months
|
||||||
|
-- (considered as 1st of July 2025)
|
||||||
|
select
|
||||||
|
property_manager_email,
|
||||||
|
min(created_date_utc) as first_verification_created_per_pm,
|
||||||
|
count(distinct id_booking) as total_count_of_bookings_per_pm
|
||||||
|
from stg_athena__verifications sav
|
||||||
|
group by 1
|
||||||
|
),
|
||||||
|
users_with_claims as (
|
||||||
|
select
|
||||||
|
u.property_manager_email,
|
||||||
|
u.first_verification_created_per_pm,
|
||||||
|
u.total_count_of_bookings_per_pm,
|
||||||
|
count(r.id_booking) as count_of_claims,
|
||||||
|
round(sum(r.claim_amount * er.rate), 0) as total_claim_amount_in_gbp,
|
||||||
|
1.0 * count(r.id_booking) / u.total_count_of_bookings_per_pm as claim_rate
|
||||||
|
from users_3_months_activity u
|
||||||
|
inner join
|
||||||
|
stg_athena__verifications v
|
||||||
|
on u.property_manager_email = v.property_manager_email
|
||||||
|
left join stg_seed__guesty_resolutions r on v.id_booking = r.id_booking
|
||||||
|
left join
|
||||||
|
int_daily_currency_exchange_rates er
|
||||||
|
on r.claim_currency = er.from_currency
|
||||||
|
and er.to_currency = 'GBP'
|
||||||
|
and r.claim_date = er.rate_date_utc
|
||||||
|
group by 1, 2, 3
|
||||||
|
),
|
||||||
|
rule_logic as (
|
||||||
|
select
|
||||||
|
*,
|
||||||
|
case
|
||||||
|
when first_verification_created_per_pm < '2025-04-01'
|
||||||
|
then true
|
||||||
|
else false
|
||||||
|
end as has_been_using_services_for_at_least_3_months,
|
||||||
|
case
|
||||||
|
when total_claim_amount_in_gbp > 2300 then true else false
|
||||||
|
end as exceeds_claim_amount_in_gbp,
|
||||||
|
case
|
||||||
|
when count_of_claims >= 5 then true else false
|
||||||
|
end as exceeds_claim_count,
|
||||||
|
case when claim_rate >= 0.07 then true else false end as exceeds_claim_rate
|
||||||
|
from users_with_claims
|
||||||
|
)
|
||||||
|
select
|
||||||
|
*,
|
||||||
|
case
|
||||||
|
when
|
||||||
|
has_been_using_services_for_at_least_3_months
|
||||||
|
and exceeds_claim_amount_in_gbp
|
||||||
|
and exceeds_claim_count
|
||||||
|
and exceeds_claim_rate
|
||||||
|
then true
|
||||||
|
else false
|
||||||
|
end as user_exceeds_all_indicators
|
||||||
|
from rule_logic
|
||||||
|
|
@ -259,3 +259,28 @@ models:
|
||||||
description: "Date of checkout for the booking"
|
description: "Date of checkout for the booking"
|
||||||
data_tests:
|
data_tests:
|
||||||
- not_null
|
- not_null
|
||||||
|
|
||||||
|
- name: int_athena__high_risk_client_detector
|
||||||
|
description: |
|
||||||
|
This model is used to detect high-risk clients based on their booking and claim history for
|
||||||
|
Guesty (Athena).
|
||||||
|
This is based on some business rules that might change in the future.
|
||||||
|
This is also based on a snapshot that might require updates in the future.
|
||||||
|
|
||||||
|
Current rules, based on the Data Request on July 1st 2025 by Chloe from Resolutions, are:
|
||||||
|
A User is considered a high-risk client if they fall into the below criteria:
|
||||||
|
1. The User has been using the agreed services for at least (3) months
|
||||||
|
2. The aggregated number of claims filed by the User exceeds a total of £2300
|
||||||
|
3. The User has filed at least (5) claims
|
||||||
|
4. The User has a claim ration of (7%) or higher throughout their entire use of agreed services, including any claim that has received a guarantee payment
|
||||||
|
columns:
|
||||||
|
- name: property_manager_email
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
Email of the property manager.
|
||||||
|
This is used to identify the property manager for the booking.
|
||||||
|
It is used to group bookings and claims by property manager.
|
||||||
|
It is unique and not null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
- unique
|
||||||
|
|
|
||||||
|
|
@ -427,3 +427,48 @@ seeds:
|
||||||
Name of the hubspot account owner.
|
Name of the hubspot account owner.
|
||||||
data_tests:
|
data_tests:
|
||||||
- not_null
|
- not_null
|
||||||
|
|
||||||
|
- name: stg_seed__guesty_resolutions_snapshot_20250701
|
||||||
|
description: |
|
||||||
|
A snapshot of Guesty Resolutions data as of 2025-07-01.
|
||||||
|
This is a static snapshot and we currently have no intent of maintaining up to date.
|
||||||
|
The data was shared by Chloe from Resolutions in a static file.
|
||||||
|
|
||||||
|
The fields described are those that are used in following models.
|
||||||
|
|
||||||
|
columns:
|
||||||
|
- name: id_booking
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
The internal ID of this booking in Guesty. Matches with the booking ID
|
||||||
|
in the Guesty verifications table.
|
||||||
|
It can contain duplicated bookings, and this is out of our scope.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: claim_date
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
When was the claim received by Truvi, in format dd/mm/yyyy.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: claim_amount
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
The amount of the claim in the currency specified in claim_currency.
|
||||||
|
It's text by default since it might contain data quality issues.
|
||||||
|
The conversion to decimal is done in dependant models.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: claim_currency
|
||||||
|
data_type: character varying
|
||||||
|
description: |
|
||||||
|
The currency specified in the claim amount.
|
||||||
|
It cannot be null.
|
||||||
|
data_tests:
|
||||||
|
- not_null
|
||||||
|
|
|
||||||
1639
seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
Normal file
1639
seeds/stg_seed__guesty_resolutions_snapshot_20250701.csv
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue