sh-notion/notion_data_team_no_files/Refactoring Business KPIs - 2024-07-05 5deb6aadddb34884ae90339402ac16e3.md

149 lines
8.8 KiB
Markdown
Raw Normal View History

2025-07-11 16:15:17 +02:00
# Refactoring Business KPIs - 2024-07-05
**2024-07-05**
# Current status
In over a month, the amount of KPIs that were providing as Data Team has grown considerably. This goes together with an increase of the dbt modelisation to support the final display in the power bi report.
The current approach is the following:
- Each metric defined in [Reporting Needs ](https://www.notion.so/Reporting-Needs-afaf4d5384764023a246d6cf7de201b4?pvs=21), gets grouped into a model with the same typology. For example, `Created Bookings`, `Cancelled Bookings` and `Checkout Bookings` are metrics associated to `Bookings`. Theres some more advanced weighted measures, such as `Guest Journey Start Rate`, that in this case it is included in the `Guest Journey`.
- For each eligible metric, we currently have **2 views**:
- A **global** view, which means is not granularized. It contains the prefix `mtd`. This one is presented in 2 formats:
- For historical months, we retrieve the metric in a monthly basis. This is effectively the same as considering the end of the month in a month-to-date basis.
- For the current month, we retrieve the information using a month-to-date approach. This means, if today is 4th of July 2024, we get the information from the 1st of July 2024 to the 3rd of July 2024 and compare with the same time period from the previous year, meaning 1st of July 2023 to 3rd July 2023. During the current month, we are able to retrieve this comparison for all days that have already of the current month. This means that apart from the 3rd of July 2024 comparison, we will also have the 1st and the 2nd. This gets cleaned up every month.
- Structure:
- The **global** structure is the following:
```sql
-- Retrieval computation
some_metrics as (
select
d.date,
count/sum/etc as aggregated_metric
from int_dates_mtd d
inner join some_table
on something
where condition
group by 1
),
--
-- Repeat for each different date logic computation
--
),
main_kpis as (
-- Final aggregation of subqueries --
select
d.year,
d.month,
d.day,
d.date,
d.is_end_of_month,
d.is_current_month,
some_metric.aggregated_metric
-- Repeat for other metrics
from int_dates_mtd d
left join some_metrics on some_metrics.date = d.date
-- Repeat for other CTEs
)
-- **Pivoting** to get previous year for each line & computing relative increment
-- (rel_incr) --
select
a.year,
a.month,
a.day,
a.is_end_of_month,
a.is_current_month,
a.date,
b.date as previous_year_date,
a.aggregated_metric,
b.aggregated_metric as previous_year_aggregated_metric,
cast(a.aggregated_metric as decimal) / b.aggregated_metric
- 1 as relative_increment_aggregated_metric,
-- Repeat triple structure for any other metric
from main_kpi a
left join main_kpi b on a.month = b.month and a.year = b.year + 1
where (a.is_end_of_month = 1 or (a.is_current_month = 1 and a.day = b.day))
```
- A **by deal** view, which extracts the information for each deal id as long as the deal is informed in the users table, and as long as we can trace that metric to a specific deal. This means, for instance, if we get some Guest Journey Completed but the Host User is not set, we wont be able to attribute it. These model contain a `monthly` and `by_deal` prefixes and suffixes respectively.
- Structure:
- The **by deal** structure is the following:
```sql
-- Retrieval computation
some_metrics as (
select
date_trunc('month', a_date_field)::date as first_day_month,
**id_deal,**
count/sum/etc as aggregated_metric
from a
inner join b
on something
where condition
group by 1**, 2**
-- Note that here we do NOT join with int_dates_by_deal because
-- 1) we don't compute MTD so it's not needed
-- 2) it heavily increases performance
),
--
-- Repeat for each different date logic computation
--
)
-- Final aggregation of subqueries --
select
d.year,
d.month,
d.day,
d.date,
**d.id_deal**,
some_metric.aggregated_metric
-- Repeat for other metrics
from int_dates_**by_deal** d
left join some_metrics
on some_metrics.**first_day_month** = d.**first_day_month**
**and some_metrics.id_deal = d.id_deal**
-- Repeat for other CTEs
-- Note that here all data is at end of month or yesterday
-- There's no relative increase vs. last year computation!
```
- Since we were only sourcing core models so far, all KPIs models are placed into core, with the usual `int_core__` convention for intermediate and `core__` for reporting.
# Limitations
- The global view directly computes the `Value`, `Previous Year Value` and `Relative Increment Value` within the same typology model. This means that, the `Bookings Created`, `Previous Year Created Bookings` and `Relative Increment Bookings Created` are handled within the `Booking` model.
- This worked well so far, because all metrics that correspond to rates were self-contained in the same group. For example, `Guest Journey Completion Rate` uses `Guest Journey Completed` and `Guest Journey Started`, that are within the `Guest Journey` model.
- But it does not work well for the new metrics we aim to implement, for instance, `Revenue per Booking`, since `Revenue` will probably be an aggregation of a `Guest Revenue` model, an `Operator Revenue` model, and an already existing `Booking` model…
- **Solution** → The pivoting logic to retrieve previous year should be abstracted into a superior layer
- The power bi report is sourcing directly a `core__mtd_aggregated_metrics` and a `core__monthly_aggregated_metrics_history_by_deal` models.
- This worked well so far, because all metrics were coming from Core.
- But it does not work well for the new metrics we aim to implement, for instance, Host Resolution Amount Paid, since this will come from Xero. The Resolution model could still be called int_xero__ABC but the aggregation would not only contain information from Core
- **Solution** → The aggregated metrics models and the exposures should migrate to Cross
# Strategy
- Keep the reporting always **alive** by creating many but small changes to the code
- Track progress and details in this Notion page
**Step-by-step details:**
1. Remove previous year comparison logic from `int_core__mtd_booking_metrics` and create an abstraction layer in `int_mtd_vs_previous_year_metrics` model that includes the bookings. Make `int_core__aggregated_metrics` read Bookings metrics from this table
2. Remove previous year comparison logic from `int_core__mtd_accommodation_metrics` and add it into the `int_mtd_vs_previous_year_metrics` model. Make `int_core__aggregated_metrics` read Accommodation metrics from this table
3. Remove previous year comparison logic from `int_core__mtd_deal_metrics` and add it into the `int_mtd_vs_previous_year_metrics` model. Make `int_core__aggregated_metrics` read Deal metrics from this table
4. Remove previous year comparison logic from `int_core__mtd_guest_journey_metrics` and add it into the `int_mtd_vs_previous_year_metrics` model. Make `int_core__aggregated_metrics` read Guest Journey metrics from this table
5. Remove previous year comparison logic from `int_mtd_guest_revenue_metrics` and add it into the `int_mtd_vs_previous_year_metrics` model. Make `int_core__aggregated_metrics` read Guest Revenue metrics from this table
6. Transition `int_core__mtd_aggregated_metrics` and `int_core__monthly_aggregated_metrics_history_by_deal` into `int_mtd_aggregated_metrics` and `int_monthly_aggregated_metrics_history_by_deal` respectively. Make the respective reporting models read from the new cross models.
7. Create the reporting cross models and re-do the reporting
8. Once validated, deploy the Power BI reading from cross reporting.
9. Once `core__monthly_aggregated_metrics_history_by_deal` and `core__mtd_aggregated_metrics` have no downstream dependants at all, remove them.