Merged PR 4996: First tracking of flagging performance

# Description

Creates 2 new models in the scope of flagging: how good are we at identifying "at risk" bookings vs. 1) the number of claims generated and 2) the number of submitted payouts?
This only applies for Protected Bookings in New Dash that have been completed (14 days after the check-out) with potential resolutions appearing in Resolutions Center.

The first table `int_flagging_booking_categorisation` contains all the heavy logic to categorise the bookings.

The second view `int_flagging_performance_analysis` computes standard binary classification scores, for the 2 possible ways of tracking.

Tables are already in prod to help you understand while reviewing. You'll see that the figures are still quite low, specially due to small amount of claims/submitted payouts. This makes the true positives being just... 1.

There's heavy test and documentation coverage to ensure there's no mistakes on the computation.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models. **Materialising as table the first model despite being just 1 record since otherwise tests takes ages**

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #29284
This commit is contained in:
Oriol Roqué Paniagua 2025-04-15 10:14:02 +00:00
parent 587661f818
commit a2cad661dd
3 changed files with 692 additions and 0 deletions

View file

@ -2800,3 +2800,369 @@ models:
- NONE
- INVOICING
- ONGOING_MONTH
- name: int_flagging_booking_categorisation
description: |
A model that computes different Booking counts depending whether these
had claims or not, if these were categorised at risk or not, and if there
was a submitted payout or not.
This only applies for Bookings:
- that come from New Dash users
- that are protected, either by a protection or a deposit management service
Additionally, we track Completed Bookings as those Bookings which, as of today,
have been checked out for more than natural 14 days.
From these Bookings, we check if these had an incident related in Resolution
Center:
- that is linked to a Booking
- that is not in a duplicated status
Since Bookings can be duplicated in the incidents data, we effectively consider:
- Bookings with "any" claim
- Bookings with a finished claim, either with a payout or not
- Bookings with a finished claim and a submitted amount for payout
data_tests:
- dbt_expectations.expect_table_row_count_to_equal:
value: 1
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: total_bookings
column_B: completed_bookings + not_completed_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: total_with_claim_bookings
column_B: completed_with_claim_bookings + not_completed_with_claim_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_bookings
column_B: completed_with_claim_bookings + completed_without_claim_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_bookings
column_B: completed_risk_bookings + completed_no_risk_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_risk_bookings
column_B: completed_risk_with_claim_bookings + completed_risk_without_claim_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_with_claim_bookings
column_B: completed_risk_with_claim_bookings + completed_no_risk_with_claim_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_no_risk_bookings
column_B: completed_no_risk_with_claim_bookings + completed_no_risk_without_claim_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_without_claim_bookings
column_B: completed_risk_without_claim_bookings + completed_no_risk_without_claim_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_bookings
column_B: completed_awaiting_resolution_bookings + completed_not_awaiting_resolution_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_not_awaiting_resolution_bookings
column_B: completed_with_submitted_payout_bookings + completed_without_submitted_payout_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_with_submitted_payout_bookings
column_B: completed_risk_with_submitted_payout_bookings + completed_no_risk_with_submitted_payout_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_without_submitted_payout_bookings
column_B: completed_risk_without_submitted_payout_bookings + completed_no_risk_without_submitted_payout_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_bookings
column_B: completed_risk_with_claim_bookings + completed_no_risk_without_claim_bookings + completed_risk_without_claim_bookings + completed_no_risk_with_claim_bookings
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: completed_not_awaiting_resolution_bookings
column_B: completed_risk_with_submitted_payout_bookings + completed_no_risk_without_submitted_payout_bookings + completed_risk_without_submitted_payout_bookings + completed_no_risk_with_submitted_payout_bookings
columns:
- name: total_bookings
data_type: integer
description: |
Current count of New Dash Protected Bookings, either a Protection Service
or a Deposit Management service, for reference.
- name: completed_bookings
data_type: integer
description: |
Current count of New Dash Protected Bookings with a Checkout happening
more than 14 days ago.
- name: not_completed_bookings
data_type: integer
description: |
Current count of New Dash Protected Bookings with a Checkout happening
between 14 days ago and today, or in the future.
- name: total_with_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected Bookings that have had a claim,
indistinctly of these bookings being considered as completed or not.
- name: completed_with_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
had a claim.
- name: not_completed_with_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected, NOT Completed Bookings that have
had a claim.
- name: completed_without_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
NOT had a claim.
- name: completed_risk_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
been flagged as at Risk.
- name: completed_no_risk_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
NOT been flagged as at Risk.
- name: completed_awaiting_resolution_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
a claim and are in a resolution status that is not finished. These
Bookings are excluded for the submitted payout-based performance
analysis, as we don't know if the claim will be paid out or not.
- name: completed_not_awaiting_resolution_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that are
not awaiting resolution, either because they have a claim in a finished
status or because they don't have a claim at all.
- name: completed_with_submitted_payout_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
had a submitted payout, with the claim being in a finished status.
- name: completed_without_submitted_payout_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
NOT had a submitted payout, either because there's a claim being in
a finished status without a payout or because there's no claim at all.
- name: completed_risk_with_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
been flagged as at Risk AND that have had a claim.
For the claim-based performance analysis, this would be the true positive.
- name: completed_no_risk_without_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
NOT been flagged as at Risk AND that have NOT had a claim.
For the claim-based performance analysis, this would be the true negative.
- name: completed_risk_without_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
been flagged as at Risk AND that have NOT had a claim.
For the claim-based performance analysis, this would be the false positive.
- name: completed_no_risk_with_claim_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
NOT been flagged as at Risk AND that have had a claim.
For the claim-based performance analysis, this would be the false negative.
- name: completed_risk_with_submitted_payout_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
been flagged as at Risk AND that have had a submitted payout, with
the claim being in a finished status.
For the submitted payout-based performance analysis, this would be
the true positive.
- name: completed_no_risk_without_submitted_payout_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
NOT been flagged as at Risk AND that have NOT had a submitted payout,
either because there's a claim being in a finished status without a
payout or because there's no claim at all.
For the submitted payout-based performance analysis, this would be
the true negative.
- name: completed_risk_without_submitted_payout_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
been flagged as at Risk AND that have NOT had a submitted payout,
either because there's a claim being in a finished status without a
payout or because there's no claim at all.
For the submitted payout-based performance analysis, this would be
the false positive.
- name: completed_no_risk_with_submitted_payout_bookings
data_type: integer
description: |
Current count of New Dash Protected and Completed Bookings that have
NOT been flagged as at Risk AND that have had a submitted payout, with
the claim being in a finished status.
For the submitted payout-based performance analysis, this would be
the false negative.
- name: int_flagging_performance_analysis
description: |
Provides a basic statistical analysis with binary classification metrics
on the flagging performance for New Dash Protected bookings, in the scope
of claims raised or submitted payouts.
data_tests:
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: count_total
column_B: count_true_positive + count_true_negative + count_false_positive + count_false_negative
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: recall_score
column_B: 1.0 * count_true_positive / (count_true_positive + count_false_negative)
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: precision_score
column_B: 1.0 * count_true_positive / (count_true_positive + count_false_positive)
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: false_positive_rate_score
column_B: 1.0 * count_false_positive / (count_false_positive + count_true_negative)
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: f1_score
column_B: 2.0 * count_true_positive / (2 * count_true_positive + count_false_negative + count_false_positive)
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: f2_score
column_B: 5.0 * count_true_positive / (5 * count_true_positive + 4 * count_false_negative + count_false_positive)
columns:
- name: flagging_analysis_type
data_type: string
description: |
Type of the analysis conducted, i.e., what do we consider as a
positive - predicted (flagged) vs. actual (claim, payout).
data_tests:
- not_null
- unique
- accepted_values:
values:
- RISK_VS_CLAIM
- RISK_VS_SUBMITTED_PAYOUT
- name: count_total
data_type: integer
description: |
Total count of bookings considered for the flagging performance analysis.
- name: count_true_positive
data_type: integer
description: |
Count of True Positives: predicted positives that are also an actual positive.
- name: count_true_negative
data_type: integer
description: |
Count of True Negatives: predicted negatives that are also an actual negative.
- name: count_false_positive
data_type: integer
description: |
Count of False Positives: predicted positives that are not an actual positive.
- name: count_false_negative
data_type: integer
description: |
Count of False Negatives: predicted negatives that are not an actual negative.
- name: true_positive_score
data_type: decimal
description: |
True Positives as a ratio over 1. This is the count of true positives divided
by the total count of bookings considered for the flagging performance analysis.
- name: true_negative_score
data_type: decimal
description: |
True Negatives, as a ratio over 1. This is the count of true negatives divided
by the total count of bookings considered for the flagging performance analysis.
- name: false_positive_score
data_type: decimal
description: |
False Positives, as a ratio over 1. This is the count of false positives divided
by the total count of bookings considered for the flagging performance analysis.
- name: false_negative_score
data_type: decimal
description: |
False Negative, as a ratio over 1. This is the count of false negatives divided
by the total count of bookings considered for the flagging performance analysis.
- name: recall_score
data_type: decimal
description: |
Recall score, or true positive rate. This corresponds to the proportion of all
actual positives that were classified correctly as a positive. It can be seen
as a probability of detection: in our case, it answers the question "what
fraction of claim/payouts were flagged as at risk?".
This is the count of true positives divided by the sum of true positives and
false negatives. Recall improves when false negatives decrease.
A hypothetical perfect model would have zero false negatives, and thus a
recall of 1.0, or 100% detection rate.
- name: precision_score
data_type: decimal
description: |
Precision score, or positive predictive value. This corresponds to the
proportion of all predicted positives that were classified correctly as a
positive. In our case, it answers the question "what fraction of
claims/payouts flagged as at risk were actually at risk?".
This is the count of true positives divided by the sum of true positives and
false positives. Precision improves when false positives decrease.
A hypothetical perfect model would have zero false positives, and thus a
precision of 1.0, or 100% precision rate.
- name: false_positive_rate_score
data_type: decimal
description: |
False positive rate, or fall-out. This corresponds to the proportion of all
actual negatives that were classified incorrectly as a positive. It can be seen
as a probability of false alarm: in our case, it answers the question "what
fraction of non-claims/payouts were flagged as at risk?".
This is the count of false positives divided by the sum of true positives and
false positives.
A hypothetical perfect model would have zero false positives, and thus a
false positive rate of 0.0, or 0% false alarm rate.
- name: f1_score
data_type: decimal
description: |
F1 score, which computes the harmonic mean of precision and recall.
This metric balances the trade-off between precision and recall, and is useful
when we want to find an optimal balance between the two.
It is defined as 2 * (precision * recall) / (precision + recall).
A hypothetical perfect model would have an F1 score of 1.0, or 100%.
When precision and recall are far apart, the F1 score will be closer to the
lower of the two.
- name: f2_score
data_type: decimal
description: |
F2 score, which computes the harmonic mean of precision and recall, but
with a twice higher weight on recall. In our case, it effectively means
that we want to reduce the number of false negatives, meaning reducing
the number of claims/payouts that are not flagged as at risk, while still
keeping a good precision.
This metric is useful when we want to prioritize recall over precision,
and is defined as 5 * (precision * recall) / (4 * precision + recall).
A hypothetical perfect model would have an F2 score of 1.0, or 100%.
When precision and recall are far apart, the F2 score will be closer to the
lower of the two.