data-dwh-dbt-project/tests/kpis_global_metrics_outlier_detection.sql

141 lines
5 KiB
MySQL
Raw Normal View History

/*
This test is applied in the reporting layer for Main KPIs,
specifically on reporting.mtd_aggregated_metrics.
It's supposed to run every day for the latest upate of KPIs.
There's chances that false positives are risen by these test. If at some
point it becomes too sensitive, just adapt the following parameters.
*/
-- Add here additive metrics that you would like to check
-- Recommended to exclude metrics that represent new products,
-- since there will be no history to check against.
-- Do NOT include rates/percentages/ratios.
{% set metric_names = (
"Cancelled Check Out Bookings",
"Cancelled Created Bookings",
"Check Out Bookings (Excl. Cancelled)",
"Created Bookings (Excl. Cancelled)",
"Damage Host-Waiver Payments",
Merged PR 3124: 1/3 - Revenue renaming Main KPIs - MTD scope # Description Adapts revenue figures in Main KPIs - MTD scope or global view. This includes MTD, Monthly Overview, Global Evolution over Time, Detail by Category. In essence, everything that is not by deal. The changes are mainly 2: * Remove the line that deducts the `Waiver Amount Paid Back to Hosts` in all metrics except the `Waiver Net Fees`. This effectively means that the previous `Guest Revenue` = `Guest Payments`, thus I dropped all 3 `Guest Payments` metrics. * Do a renaming at metric display level, but not in the code. This means that I remove the computation of `guest_revenue_in_gbp` for instance and keep `guest_payments_in_gbp`, and apply the renaming later on, since the modelisation already accounts for defining metric names differently from those of the fields. For the rest of metrics, I revised all metrics name and did changes based on the [whiteboard](https://whiteboard.office.com/me/whiteboards/p/c3BvOmh0dHBzOi8vZ3VhcmRob2ctbXkuc2hhcmVwb2ludC5jb20vcGVyc29uYWwvcGFibG9fbWFydGluX3N1cGVyaG9nX2NvbQ%3d%3d/b!T2D3opQuBECSDnhuFZrUacFu3TxvSvdIsnI4Dxsh2IuaB1AigbciRqkqte61I4wz/01H5SI4J4L7HTPJGUT7JGYKTOSQYYWACXU). I also changed the dedicated data tests in Main KPIs to ensure it's working. I also changed the exclusion logic in reporting based on the name of the metric to not display metrics that depend on the invoicing cycle unless it's 2 months ago or before. To keep in mind: * Merging this will automatically display the new figures/naming in production. Might be wise to communicate to stakeholders since some key metrics (namely, Guest Revenue / Total Revenue) will change the meaning. * We also need to do these changes in the metrics by deal part of the computation. I'd do first the removal of these fields in the PBI report (and take the opportunity to change the Data Catalogue) and then do the PR in DWH to change the logic. Before that though let's check that the names included in this PR are the correct ones :) # Checklist - [X] The edited models and dependants run properly with production data. - [NA] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [NA] I have checked for DRY opportunities with other models and docs. - [NA] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #22688
2024-10-10 13:46:59 +00:00
"Deposit Fees Revenue",
"Est. Billable Bookings",
"First Time Booked Deals",
"First Time Booked Listings",
"Guest Journey Completed",
"Guest Journey Created",
"Guest Journey Started",
"Guest Journey with Payment",
"Guest Revenue",
"Host Resolutions Amount Paid",
"Host Resolutions Payment Count",
"Invoiced APIs Revenue",
"Invoiced Athena Revenue",
Merged PR 4727: Bookings fees are now Old Dashboard Booking Fees in Main KPIs # Description Booking fees is widely used with different meanings, for old dash, for new dash, for both, etc. This is painful. First step to align on a proper naming is ensure that what we report in Main KPIs is clearly stated, which in this case, Booking Fees are now called Old Dashboard Booking Fees. Changes: * Modify `stg_seed__accounting_aggregations` seed to rename Booking Fees to Old Dashboard Booking Fees. This is for us to clarify. This is only applied for KPIs compute. I also added an empty space that I mistakenly forgot in the past for `financial_l3_aggregation`. * Modify KPIs source, i.e., `int_kpis__metric_daily_invoiced_revenue`. Here I forcefully modify the name of the field to `xero_old_dashboard_booking_net_fees_in_gbp`. * Propagate changes of downstream usages of `xero_booking_net_fees_in_gbp` to `xero_old_dashboard_booking_net_fees_in_gbp`. This affects all models, including the reporting model. On this one we still have both names to avoid breaking it. I will need to modify the data glossary in PBI anyway so I'll do this change as well. * Modify displayed metric name from Booking Fees Revenue to Old Dashboard Booking Fees Revenue. * Modify schema so it reflects the proper names, descriptions, and tests. * Ensure outlier and completion tests still work after this change. I confirm the field `xero_booking_net_fees_in_gbp` does not exist anymore in the rest of DWH after these changes, except for the abovementioned comment on the reporting line. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #28560
2025-03-18 14:55:32 +00:00
"Invoiced Old Dashboard Booking Fees Revenue",
"Invoiced Total Booking Fees Revenue",
Merged PR 3124: 1/3 - Revenue renaming Main KPIs - MTD scope # Description Adapts revenue figures in Main KPIs - MTD scope or global view. This includes MTD, Monthly Overview, Global Evolution over Time, Detail by Category. In essence, everything that is not by deal. The changes are mainly 2: * Remove the line that deducts the `Waiver Amount Paid Back to Hosts` in all metrics except the `Waiver Net Fees`. This effectively means that the previous `Guest Revenue` = `Guest Payments`, thus I dropped all 3 `Guest Payments` metrics. * Do a renaming at metric display level, but not in the code. This means that I remove the computation of `guest_revenue_in_gbp` for instance and keep `guest_payments_in_gbp`, and apply the renaming later on, since the modelisation already accounts for defining metric names differently from those of the fields. For the rest of metrics, I revised all metrics name and did changes based on the [whiteboard](https://whiteboard.office.com/me/whiteboards/p/c3BvOmh0dHBzOi8vZ3VhcmRob2ctbXkuc2hhcmVwb2ludC5jb20vcGVyc29uYWwvcGFibG9fbWFydGluX3N1cGVyaG9nX2NvbQ%3d%3d/b!T2D3opQuBECSDnhuFZrUacFu3TxvSvdIsnI4Dxsh2IuaB1AigbciRqkqte61I4wz/01H5SI4J4L7HTPJGUT7JGYKTOSQYYWACXU). I also changed the dedicated data tests in Main KPIs to ensure it's working. I also changed the exclusion logic in reporting based on the name of the metric to not display metrics that depend on the invoicing cycle unless it's 2 months ago or before. To keep in mind: * Merging this will automatically display the new figures/naming in production. Might be wise to communicate to stakeholders since some key metrics (namely, Guest Revenue / Total Revenue) will change the meaning. * We also need to do these changes in the metrics by deal part of the computation. I'd do first the removal of these fields in the PBI report (and take the opportunity to change the Data Catalogue) and then do the PR in DWH to change the logic. Before that though let's check that the names included in this PR are the correct ones :) # Checklist - [X] The edited models and dependants run properly with production data. - [NA] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [NA] I have checked for DRY opportunities with other models and docs. - [NA] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #22688
2024-10-10 13:46:59 +00:00
"Invoiced E-Deposit Revenue",
"Invoiced Listing Fees Revenue",
"Invoiced Operator Revenue",
Merged PR 3124: 1/3 - Revenue renaming Main KPIs - MTD scope # Description Adapts revenue figures in Main KPIs - MTD scope or global view. This includes MTD, Monthly Overview, Global Evolution over Time, Detail by Category. In essence, everything that is not by deal. The changes are mainly 2: * Remove the line that deducts the `Waiver Amount Paid Back to Hosts` in all metrics except the `Waiver Net Fees`. This effectively means that the previous `Guest Revenue` = `Guest Payments`, thus I dropped all 3 `Guest Payments` metrics. * Do a renaming at metric display level, but not in the code. This means that I remove the computation of `guest_revenue_in_gbp` for instance and keep `guest_payments_in_gbp`, and apply the renaming later on, since the modelisation already accounts for defining metric names differently from those of the fields. For the rest of metrics, I revised all metrics name and did changes based on the [whiteboard](https://whiteboard.office.com/me/whiteboards/p/c3BvOmh0dHBzOi8vZ3VhcmRob2ctbXkuc2hhcmVwb2ludC5jb20vcGVyc29uYWwvcGFibG9fbWFydGluX3N1cGVyaG9nX2NvbQ%3d%3d/b!T2D3opQuBECSDnhuFZrUacFu3TxvSvdIsnI4Dxsh2IuaB1AigbciRqkqte61I4wz/01H5SI4J4L7HTPJGUT7JGYKTOSQYYWACXU). I also changed the dedicated data tests in Main KPIs to ensure it's working. I also changed the exclusion logic in reporting based on the name of the metric to not display metrics that depend on the invoicing cycle unless it's 2 months ago or before. To keep in mind: * Merging this will automatically display the new figures/naming in production. Might be wise to communicate to stakeholders since some key metrics (namely, Guest Revenue / Total Revenue) will change the meaning. * We also need to do these changes in the metrics by deal part of the computation. I'd do first the removal of these fields in the PBI report (and take the opportunity to change the Data Catalogue) and then do the PR in DWH to change the logic. Before that though let's check that the names included in this PR are the correct ones :) # Checklist - [X] The edited models and dependants run properly with production data. - [NA] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [NA] I have checked for DRY opportunities with other models and docs. - [NA] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #22688
2024-10-10 13:46:59 +00:00
"Invoiced Verification Fees Revenue",
"New Deals",
"New Listings",
"Revenue Retained",
"Revenue Retained Post-Resolutions",
"Total Check Out Bookings",
"Total Created Bookings",
"Total Revenue",
Merged PR 3124: 1/3 - Revenue renaming Main KPIs - MTD scope # Description Adapts revenue figures in Main KPIs - MTD scope or global view. This includes MTD, Monthly Overview, Global Evolution over Time, Detail by Category. In essence, everything that is not by deal. The changes are mainly 2: * Remove the line that deducts the `Waiver Amount Paid Back to Hosts` in all metrics except the `Waiver Net Fees`. This effectively means that the previous `Guest Revenue` = `Guest Payments`, thus I dropped all 3 `Guest Payments` metrics. * Do a renaming at metric display level, but not in the code. This means that I remove the computation of `guest_revenue_in_gbp` for instance and keep `guest_payments_in_gbp`, and apply the renaming later on, since the modelisation already accounts for defining metric names differently from those of the fields. For the rest of metrics, I revised all metrics name and did changes based on the [whiteboard](https://whiteboard.office.com/me/whiteboards/p/c3BvOmh0dHBzOi8vZ3VhcmRob2ctbXkuc2hhcmVwb2ludC5jb20vcGVyc29uYWwvcGFibG9fbWFydGluX3N1cGVyaG9nX2NvbQ%3d%3d/b!T2D3opQuBECSDnhuFZrUacFu3TxvSvdIsnI4Dxsh2IuaB1AigbciRqkqte61I4wz/01H5SI4J4L7HTPJGUT7JGYKTOSQYYWACXU). I also changed the dedicated data tests in Main KPIs to ensure it's working. I also changed the exclusion logic in reporting based on the name of the metric to not display metrics that depend on the invoicing cycle unless it's 2 months ago or before. To keep in mind: * Merging this will automatically display the new figures/naming in production. Might be wise to communicate to stakeholders since some key metrics (namely, Guest Revenue / Total Revenue) will change the meaning. * We also need to do these changes in the metrics by deal part of the computation. I'd do first the removal of these fields in the PBI report (and take the opportunity to change the Data Catalogue) and then do the PR in DWH to change the logic. Before that though let's check that the names included in this PR are the correct ones :) # Checklist - [X] The edited models and dependants run properly with production data. - [NA] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [NA] I have checked for DRY opportunities with other models and docs. - [NA] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #22688
2024-10-10 13:46:59 +00:00
"Waiver Revenue",
"Waiver Retained",
) %}
-- Specify here the day of the month that will start to be considered
-- for outlier detection. Keep in mind that 1st of every month is quite
-- unreliable thus false positives could appear. Recommended minimum 2.
{% set start_validating_on_this_day_month = 2 %}
-- Specify here the strength of the detector. A higher value
-- means that this test will allow for more variance to be accepted,
-- thus it will be more tolerant.
-- A lower value means that the chances of detecting outliers
-- and false positives will be higher. Recommended around 10.
{% set detector_tolerance = 5 %}
-- Specify here the number of days in the past that will be used
-- to compare against. Keep in mind that we only keep the daily
-- information for the current month, thus having 180 days here
-- it means that we will take 1) all values of the current month
-- except the latest update and 2) the end of month figures for the
-- past 6 months max.
{% set timeline_to_compare_against = 180 %}
with
max_date as (
select max(date) as max_date
from {{ ref("mtd_aggregated_metrics") }}
-- First days of the month is usually not estable to run
where day >= {{ start_validating_on_this_day_month }}
),
metric_data as (
select
date,
metric,
value,
coalesce(abs(value), 0) / day as abs_daily_value,
case when date = max_date then 1 else 0 end as is_max_date
from {{ ref("mtd_aggregated_metrics") }}
cross join max_date
where
day >= {{ start_validating_on_this_day_month }}
and dimension = 'Global'
and date between max_date -{{ timeline_to_compare_against }} and max_date
and metric in {{ metric_names }}
),
metrics_to_validate as (
select date, metric, value, abs_daily_value
from metric_data
where is_max_date = 1
),
metrics_to_compare_against as (
select
metric,
avg(abs_daily_value) as avg_daily_value_previous_dates,
stddev(abs_daily_value) as std_daily_value_previous_dates,
greatest(
avg(abs_daily_value)
- {{ detector_tolerance }} * stddev(abs_daily_value),
0
) as lower_bound,
avg(abs_daily_value)
+ {{ detector_tolerance }} * stddev(abs_daily_value) as upper_bound
from metric_data
where is_max_date = 0
group by 1
),
metrics_comparison as (
select
mtv.date,
mtv.metric,
mtv.value,
mtv.abs_daily_value,
mtca.avg_daily_value_previous_dates,
mtca.std_daily_value_previous_dates,
mtca.lower_bound,
mtca.upper_bound,
case
when
mtv.abs_daily_value >= mtca.lower_bound
and mtv.abs_daily_value <= mtca.upper_bound
then true
else false
end as is_abs_daily_value_accepted,
abs(mtv.abs_daily_value - mtca.avg_daily_value_previous_dates)
/ mtca.std_daily_value_previous_dates as signal_to_noise_factor
from metrics_to_validate mtv
inner join metrics_to_compare_against mtca on mtv.metric = mtca.metric
)
select *
from metrics_comparison
where is_abs_daily_value_accepted = false
order by signal_to_noise_factor desc