From 3ba65e8ed512a5ae99c4fa281ea3ed9a7d9f7cff Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Oriol=20Roqu=C3=A9=20Paniagua?= <oriol.roque@superhog.com>
Date: Thu, 3 Oct 2024 10:43:15 +0000
Subject: [PATCH] Merged PR 3041: Adapts outlier detection

# Description

This PR adapts the outlier detection test for KPIs. Specifically:

1) It removes not additive lifecycle metrics, which are:
- Churning Listings/Deals
- Listings/Deals Booked in Month/6 Months/12 Months

this is because the test computes data at daily level by just doing value/number of days. The thing is that for all these metrics, Listing/Deal bookings are computed **uniquely over a month**, i.e., if a listing is booked 100 times in a single month, it will only appear as once. Thus it makes it fail on early days of the month. Similar case for Churn, in this case, at the beginning of the month we have the total maximum number of listing/deals that are expected to churn if nothing happens, and this can decrease a bit over time if these get reactivated.

2) I reduced the variance threshold from 10 to 8, meaning now the alerts will raise more often. This is because we're removing some wrongly assessed metrics from the computation, thus I feel we can leave with better fine-grained detection. It could be even further reduced (8 is still super high tolerance) since today maximum signal-to-noise ratio was less than 4 on checkout bookings, but I'd propose to see how it goes in the following days and then assess if it's necessary to reduce it even further.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [ ] The edited models are sufficiently documented.
- [ ] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.
---
 tests/kpis_global_metrics_outlier_detection.sql | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/tests/kpis_global_metrics_outlier_detection.sql b/tests/kpis_global_metrics_outlier_detection.sql
index ce80f8d..0ac47c1 100644
--- a/tests/kpis_global_metrics_outlier_detection.sql
+++ b/tests/kpis_global_metrics_outlier_detection.sql
@@ -8,7 +8,6 @@ There's chances that false positives are risen by these test. If at some
 point it becomes too sensitive, just adapt the following parameters.
 
 */
-
 -- Add here additive metrics that you would like to check
 -- Recommended to exclude metrics that represent new products,
 -- since there will be no history to check against. 
@@ -16,12 +15,7 @@ point it becomes too sensitive, just adapt the following parameters.
 {% set metric_names = (
     "Cancelled Bookings",
     "Checkout Bookings",
-    "Churning Deals",
-    "Churning Listings",
     "Created Bookings",
-    "Deals Booked in 12 Months",
-    "Deals Booked in 6 Months",
-    "Deals Booked in Month",
     "Deposit Fees",
     "Est. Billable Bookings",
     "First Time Booked Deals",
@@ -41,9 +35,6 @@ point it becomes too sensitive, just adapt the following parameters.
     "Invoiced Listing Fees",
     "Invoiced Operator Revenue",
     "Invoiced Verification Fees",
-    "Listings Booked in 12 Months",
-    "Listings Booked in 6 Months",
-    "Listings Booked in Month",
     "New Deals",
     "New Listings",
     "Total Revenue",
@@ -62,7 +53,7 @@ point it becomes too sensitive, just adapt the following parameters.
 -- thus it will be more tolerant.
 -- A lower value means that the chances of detecting outliers 
 -- and false positives will be higher. Recommended around 10.
-{% set detector_tolerance = 10 %}
+{% set detector_tolerance = 8 %}
 
 -- Specify here the number of days in the past that will be used
 -- to compare against. Keep in mind that we only keep the daily