data-dwh-dbt-project/models/intermediate/cross/int_flagging_booking_categorisation.sql

279 lines
10 KiB
MySQL
Raw Normal View History

Merged PR 4996: First tracking of flagging performance # Description Creates 2 new models in the scope of flagging: how good are we at identifying "at risk" bookings vs. 1) the number of claims generated and 2) the number of submitted payouts? This only applies for Protected Bookings in New Dash that have been completed (14 days after the check-out) with potential resolutions appearing in Resolutions Center. The first table `int_flagging_booking_categorisation` contains all the heavy logic to categorise the bookings. The second view `int_flagging_performance_analysis` computes standard binary classification scores, for the 2 possible ways of tracking. Tables are already in prod to help you understand while reviewing. You'll see that the figures are still quite low, specially due to small amount of claims/submitted payouts. This makes the true positives being just... 1. There's heavy test and documentation coverage to ensure there's no mistakes on the computation. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. **Materialising as table the first model despite being just 1 record since otherwise tests takes ages** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #29284
2025-04-15 10:14:02 +00:00
{% set risk_booking_status = ("NOTAPPROVED", "FLAGGED") %}
{% set no_risk_booking_status = ("APPROVED", "NOFLAGS") %}
{% set incident_duplicated_status = "CLOSED - DUPLICATE" %}
{% set incident_finished_status = (
"RESOLVED",
"RESOLVED EXCEPTION",
"CLOSED - NO REPLY",
"CLOSED - OTHER",
"CLOSED - WAIVER CR",
"CLOSED - LATE REPORT",
"CLOSED - NOT COVERED",
"CLOSED - NOT LIABLE",
"CLOSED HOST REQUEST",
"CLOSED - NOT APPROVED",
"CLOSED - THIRD PARTY",
) %}
{% set days_from_checkout_to_completion = 14 %}
{{ config(materialized="table") }}
with
int_booking_summary as (select * from {{ ref("int_booking_summary") }}),
Merged PR 4996: First tracking of flagging performance # Description Creates 2 new models in the scope of flagging: how good are we at identifying "at risk" bookings vs. 1) the number of claims generated and 2) the number of submitted payouts? This only applies for Protected Bookings in New Dash that have been completed (14 days after the check-out) with potential resolutions appearing in Resolutions Center. The first table `int_flagging_booking_categorisation` contains all the heavy logic to categorise the bookings. The second view `int_flagging_performance_analysis` computes standard binary classification scores, for the 2 possible ways of tracking. Tables are already in prod to help you understand while reviewing. You'll see that the figures are still quite low, specially due to small amount of claims/submitted payouts. This makes the true positives being just... 1. There's heavy test and documentation coverage to ensure there's no mistakes on the computation. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. **Materialising as table the first model despite being just 1 record since otherwise tests takes ages** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #29284
2025-04-15 10:14:02 +00:00
int_resolutions__incidents as (
select * from {{ ref("int_resolutions__incidents") }}
),
-- The same booking can have multiple Incidents
deduplicated_incidents as (
select
id_booking,
sum(submitted_payout_amount_in_gbp) as submitted_payout_amount_in_gbp,
Merged PR 4996: First tracking of flagging performance # Description Creates 2 new models in the scope of flagging: how good are we at identifying "at risk" bookings vs. 1) the number of claims generated and 2) the number of submitted payouts? This only applies for Protected Bookings in New Dash that have been completed (14 days after the check-out) with potential resolutions appearing in Resolutions Center. The first table `int_flagging_booking_categorisation` contains all the heavy logic to categorise the bookings. The second view `int_flagging_performance_analysis` computes standard binary classification scores, for the 2 possible ways of tracking. Tables are already in prod to help you understand while reviewing. You'll see that the figures are still quite low, specially due to small amount of claims/submitted payouts. This makes the true positives being just... 1. There's heavy test and documentation coverage to ensure there's no mistakes on the computation. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. **Materialising as table the first model despite being just 1 record since otherwise tests takes ages** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #29284
2025-04-15 10:14:02 +00:00
case
when sum(submitted_payout_amount_in_gbp) > 0 then true else false
end as has_submitted_payout,
case
when
sum(
case
when
upper(current_status_name)
in {{ incident_finished_status }}
then 1
else 0
end
)
> 0
then true
else false
end as is_incident_finished
from int_resolutions__incidents
where upper(current_status_name) != '{{ incident_duplicated_status }}'
group by 1
),
new_dash_protected_bookings as (
select
id_booking,
case
when
(current_date - booking_check_out_date_utc)
> {{ days_from_checkout_to_completion }}
then true
else false
end as is_booking_completed,
case
when upper(booking_status) in {{ risk_booking_status }}
then true
when upper(booking_status) in {{ no_risk_booking_status }}
then false
else null
end as is_booking_flagged_as_risk
from int_booking_summary
Merged PR 4996: First tracking of flagging performance # Description Creates 2 new models in the scope of flagging: how good are we at identifying "at risk" bookings vs. 1) the number of claims generated and 2) the number of submitted payouts? This only applies for Protected Bookings in New Dash that have been completed (14 days after the check-out) with potential resolutions appearing in Resolutions Center. The first table `int_flagging_booking_categorisation` contains all the heavy logic to categorise the bookings. The second view `int_flagging_performance_analysis` computes standard binary classification scores, for the 2 possible ways of tracking. Tables are already in prod to help you understand while reviewing. You'll see that the figures are still quite low, specially due to small amount of claims/submitted payouts. This makes the true positives being just... 1. There's heavy test and documentation coverage to ensure there's no mistakes on the computation. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. **Materialising as table the first model despite being just 1 record since otherwise tests takes ages** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #29284
2025-04-15 10:14:02 +00:00
where
-- Bookings from New Dash users with Id Deal
is_user_in_new_dash = true
and is_missing_id_deal = false
-- Protected Bookings with a Protection or a Deposit Management service
and (
has_protection_service_business_type
or has_deposit_management_service_business_type
)
-- Bookings with relevant status (i.e. not cancelled, not pending)
and (
upper(booking_status) in {{ risk_booking_status }}
or upper(booking_status) in {{ no_risk_booking_status }}
)
),
bookings_with_incidents as (
select
ndpb.id_booking,
ndpb.is_booking_completed,
ndpb.is_booking_flagged_as_risk,
case when di.id_booking is not null then true else false end as has_claim,
coalesce(di.has_submitted_payout, false) as has_submitted_payout,
coalesce(di.is_incident_finished, false) as is_incident_finished,
coalesce(
submitted_payout_amount_in_gbp, 0
) as submitted_payout_amount_in_gbp
Merged PR 4996: First tracking of flagging performance # Description Creates 2 new models in the scope of flagging: how good are we at identifying "at risk" bookings vs. 1) the number of claims generated and 2) the number of submitted payouts? This only applies for Protected Bookings in New Dash that have been completed (14 days after the check-out) with potential resolutions appearing in Resolutions Center. The first table `int_flagging_booking_categorisation` contains all the heavy logic to categorise the bookings. The second view `int_flagging_performance_analysis` computes standard binary classification scores, for the 2 possible ways of tracking. Tables are already in prod to help you understand while reviewing. You'll see that the figures are still quite low, specially due to small amount of claims/submitted payouts. This makes the true positives being just... 1. There's heavy test and documentation coverage to ensure there's no mistakes on the computation. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. **Materialising as table the first model despite being just 1 record since otherwise tests takes ages** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #29284
2025-04-15 10:14:02 +00:00
from new_dash_protected_bookings ndpb
left join deduplicated_incidents di on ndpb.id_booking = di.id_booking
)
select
-- High Level Bookings --
count(id_booking) as total_bookings,
count(id_booking) filter (where is_booking_completed) as completed_bookings,
count(id_booking) filter (where not is_booking_completed) as not_completed_bookings,
-- High Level Claims --
count(id_booking) filter (where has_claim) as total_with_claim_bookings,
-- Completed with Claim --
count(id_booking) filter (
where is_booking_completed and has_claim
) as completed_with_claim_bookings,
-- Not Completed with Claim --
count(id_booking) filter (
where not is_booking_completed and has_claim
) as not_completed_with_claim_bookings,
-- Completed without Claim --
count(id_booking) filter (
where is_booking_completed and not has_claim
) as completed_without_claim_bookings,
-- Completed with Risk
count(id_booking) filter (
where is_booking_completed and is_booking_flagged_as_risk
) as completed_risk_bookings,
-- Completed without Risk
count(id_booking) filter (
where is_booking_completed and not is_booking_flagged_as_risk
) as completed_no_risk_bookings,
-- Completed awaiting resolution
count(id_booking) filter (
where is_booking_completed and has_claim and not is_incident_finished
) as completed_awaiting_resolution_bookings,
-- Completed not awaiting resolution
count(id_booking) filter (
where
is_booking_completed
and ((has_claim and is_incident_finished) or (not has_claim))
) as completed_not_awaiting_resolution_bookings,
-- Completed with Submitted Payout
count(id_booking) filter (
where
is_booking_completed
and has_submitted_payout
and is_incident_finished
and has_claim
) as completed_with_submitted_payout_bookings,
-- Completed without Submitted Payout
count(id_booking) filter (
where
is_booking_completed
and (
(has_claim and not has_submitted_payout and is_incident_finished)
or (not has_claim)
)
) as completed_without_submitted_payout_bookings,
-- CONFUSION MATRIX 1: RISK vs. CLAIM --
-- Completed with Risk and with Claim -- True Positives
count(id_booking) filter (
where is_booking_completed and is_booking_flagged_as_risk and has_claim
) as completed_risk_with_claim_bookings,
-- Completed without Risk and without Claim -- True Negatives
count(id_booking) filter (
where is_booking_completed and not is_booking_flagged_as_risk and not has_claim
) as completed_no_risk_without_claim_bookings,
-- Completed with Risk and without Claim -- False Positive
count(id_booking) filter (
where is_booking_completed and is_booking_flagged_as_risk and not has_claim
) as completed_risk_without_claim_bookings,
-- Completed without Risk and with Claim -- False Negatives
count(id_booking) filter (
where is_booking_completed and not is_booking_flagged_as_risk and has_claim
) as completed_no_risk_with_claim_bookings,
-- CONFUSION MATRIX 2: RISK vs. PAYOUT --
-- Completed with Risk, Incident is finished and with Submitted Payout --
-- True Positives
count(id_booking) filter (
where
is_booking_completed
and is_booking_flagged_as_risk
and has_claim
and has_submitted_payout
and is_incident_finished
) as completed_risk_with_submitted_payout_bookings,
-- Completed without Risk and without Submitted Payout -- True Negatives
count(id_booking) filter (
where
is_booking_completed
and not is_booking_flagged_as_risk
and (
(has_claim and not has_submitted_payout and is_incident_finished)
or (not has_claim)
)
) as completed_no_risk_without_submitted_payout_bookings,
-- Completed with Risk and without Submitted Payout -- False Positive
count(id_booking) filter (
where
is_booking_completed
and is_booking_flagged_as_risk
and (
(has_claim and not has_submitted_payout and is_incident_finished)
or (not has_claim)
)
) as completed_risk_without_submitted_payout_bookings,
-- Completed without Risk and with Submitted Payout -- False Negative
count(id_booking) filter (
where
is_booking_completed
and not is_booking_flagged_as_risk
and has_submitted_payout
and has_claim
and is_incident_finished
) as completed_no_risk_with_submitted_payout_bookings,
-- ADDITIONAL PAYOUT INPUT - FOR CONFUSION MATRIX 2: RISK vs. PAYOUT --
-- Total amount paid for all bookings
sum(submitted_payout_amount_in_gbp) as total_amount_paid_in_gbp,
-- Amount paid from completed bookings
sum(submitted_payout_amount_in_gbp) filter (
where is_booking_completed
) as completed_amount_paid_in_gbp,
-- Amount paid from not completed bookings
sum(submitted_payout_amount_in_gbp) filter (
where not is_booking_completed
) as not_completed_amount_paid_in_gbp,
-- Amount paid from completed bookings with finished incidents
sum(submitted_payout_amount_in_gbp) filter (
where
is_booking_completed
and has_claim
and is_incident_finished
and has_submitted_payout
) as completed_finished_incidents_amount_paid_in_gbp,
-- Amount paid from completed bookings with awaiting finish incidents
sum(submitted_payout_amount_in_gbp) filter (
where
is_booking_completed
and has_claim
and not is_incident_finished
and has_submitted_payout
) as completed_awaiting_finish_incidents_amount_paid_in_gbp,
-- Completed with Risk, Incident is finished and with Submitted Payout --
-- True Positives - How much did we pay?
sum(submitted_payout_amount_in_gbp) filter (
where
is_booking_completed
and is_booking_flagged_as_risk
and has_claim
and has_submitted_payout
and is_incident_finished
) as completed_risk_with_submitted_payout_amount_paid_in_gbp,
-- Completed without Risk and with Submitted Payout
-- False Negative - How much did we pay?
sum(submitted_payout_amount_in_gbp) filter (
where
is_booking_completed
and not is_booking_flagged_as_risk
and has_submitted_payout
and has_claim
and is_incident_finished
) as completed_no_risk_with_submitted_payout_amount_paid_in_gbp
Merged PR 4996: First tracking of flagging performance # Description Creates 2 new models in the scope of flagging: how good are we at identifying "at risk" bookings vs. 1) the number of claims generated and 2) the number of submitted payouts? This only applies for Protected Bookings in New Dash that have been completed (14 days after the check-out) with potential resolutions appearing in Resolutions Center. The first table `int_flagging_booking_categorisation` contains all the heavy logic to categorise the bookings. The second view `int_flagging_performance_analysis` computes standard binary classification scores, for the 2 possible ways of tracking. Tables are already in prod to help you understand while reviewing. You'll see that the figures are still quite low, specially due to small amount of claims/submitted payouts. This makes the true positives being just... 1. There's heavy test and documentation coverage to ensure there's no mistakes on the computation. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. **Materialising as table the first model despite being just 1 record since otherwise tests takes ages** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #29284
2025-04-15 10:14:02 +00:00
from bookings_with_incidents