sh-notion/notion_data_team_no_files/Refactoring Business KPIs - 2024-07-05 5deb6aadddb34884ae90339402ac16e3.md
Pablo Martin a256b48b01 pages
2025-07-11 16:15:17 +02:00

8.8 KiB
Raw Permalink Blame History

Refactoring Business KPIs - 2024-07-05

2024-07-05

Current status

In over a month, the amount of KPIs that were providing as Data Team has grown considerably. This goes together with an increase of the dbt modelisation to support the final display in the power bi report.

The current approach is the following:

  • Each metric defined in Reporting Needs , gets grouped into a model with the same typology. For example, Created Bookings, Cancelled Bookings and Checkout Bookings are metrics associated to Bookings. Theres some more advanced weighted measures, such as Guest Journey Start Rate, that in this case it is included in the Guest Journey.

  • For each eligible metric, we currently have 2 views:

    • A global view, which means is not granularized. It contains the prefix mtd. This one is presented in 2 formats:

      • For historical months, we retrieve the metric in a monthly basis. This is effectively the same as considering the end of the month in a month-to-date basis.

      • For the current month, we retrieve the information using a month-to-date approach. This means, if today is 4th of July 2024, we get the information from the 1st of July 2024 to the 3rd of July 2024 and compare with the same time period from the previous year, meaning 1st of July 2023 to 3rd July 2023. During the current month, we are able to retrieve this comparison for all days that have already of the current month. This means that apart from the 3rd of July 2024 comparison, we will also have the 1st and the 2nd. This gets cleaned up every month.

      • Structure:

        • The global structure is the following:
        -- Retrieval computation
        some_metrics as (
        select 
        	d.date, 
        	count/sum/etc as aggregated_metric
        from int_dates_mtd d
        inner join some_table
        on something
        where condition
        group by 1
        ),
        --
        -- Repeat for each different date logic computation
        -- 
        ),
        main_kpis as (
        -- Final aggregation of subqueries --
          select
              d.year,
              d.month,
              d.day,
              d.date,
              d.is_end_of_month,
              d.is_current_month,
              some_metric.aggregated_metric
              -- Repeat for other metrics
          from int_dates_mtd d
          left join some_metrics on some_metrics.date = d.date
        	-- Repeat for other CTEs
        
        )
        -- **Pivoting** to get previous year for each line & computing relative increment
        -- (rel_incr) --
        select
            a.year,
            a.month,
            a.day,
            a.is_end_of_month,
            a.is_current_month,
            a.date,
            b.date as previous_year_date,
        
            a.aggregated_metric,
        
            b.aggregated_metric as previous_year_aggregated_metric,
        
            cast(a.aggregated_metric as decimal) / b.aggregated_metric
            - 1 as relative_increment_aggregated_metric,
        
            -- Repeat triple structure for any other metric
        
        from main_kpi a
        left join main_kpi b on a.month = b.month and a.year = b.year + 1
        where (a.is_end_of_month = 1 or (a.is_current_month = 1 and a.day = b.day))
        
    • A by deal view, which extracts the information for each deal id as long as the deal is informed in the users table, and as long as we can trace that metric to a specific deal. This means, for instance, if we get some Guest Journey Completed but the Host User is not set, we wont be able to attribute it. These model contain a monthly and by_deal prefixes and suffixes respectively.

      • Structure:
        • The by deal structure is the following:
      -- Retrieval computation
      some_metrics as (
      select 
      	date_trunc('month', a_date_field)::date as first_day_month, 
      	**id_deal,**
      	count/sum/etc as aggregated_metric
      from a
      inner join b
      on something
      where condition 
      group by 1**, 2**
      -- Note that here we do NOT join with int_dates_by_deal because
      -- 1) we don't compute MTD so it's not needed
      -- 2) it heavily increases performance
      ),
      --
      -- Repeat for each different date logic computation
      -- 
      )
      -- Final aggregation of subqueries --
        select
            d.year,
            d.month,
            d.day,
            d.date,
            **d.id_deal**,
            some_metric.aggregated_metric
            -- Repeat for other metrics
        from int_dates_**by_deal** d
        left join some_metrics 
      	  on some_metrics.**first_day_month** = d.**first_day_month**
      	  **and some_metrics.id_deal = d.id_deal**
      	-- Repeat for other CTEs
      
      -- Note that here all data is at end of month or yesterday
      -- There's no relative increase vs. last year computation!
      
  • Since we were only sourcing core models so far, all KPIs models are placed into core, with the usual int_core__ convention for intermediate and core__ for reporting.

Limitations

  • The global view directly computes the Value, Previous Year Value and Relative Increment Value within the same typology model. This means that, the Bookings Created, Previous Year Created Bookings and Relative Increment Bookings Created are handled within the Booking model.
    • This worked well so far, because all metrics that correspond to rates were self-contained in the same group. For example, Guest Journey Completion Rate uses Guest Journey Completed and Guest Journey Started, that are within the Guest Journey model.
    • But it does not work well for the new metrics we aim to implement, for instance, Revenue per Booking, since Revenue will probably be an aggregation of a Guest Revenue model, an Operator Revenue model, and an already existing Booking model…
    • Solution → The pivoting logic to retrieve previous year should be abstracted into a superior layer
  • The power bi report is sourcing directly a core__mtd_aggregated_metrics and a core__monthly_aggregated_metrics_history_by_deal models.
    • This worked well so far, because all metrics were coming from Core.
    • But it does not work well for the new metrics we aim to implement, for instance, Host Resolution Amount Paid, since this will come from Xero. The Resolution model could still be called int_xero__ABC but the aggregation would not only contain information from Core
    • Solution → The aggregated metrics models and the exposures should migrate to Cross

Strategy

  • Keep the reporting always alive by creating many but small changes to the code
  • Track progress and details in this Notion page

Step-by-step details:

  1. Remove previous year comparison logic from int_core__mtd_booking_metrics and create an abstraction layer in int_mtd_vs_previous_year_metrics model that includes the bookings. Make int_core__aggregated_metrics read Bookings metrics from this table
  2. Remove previous year comparison logic from int_core__mtd_accommodation_metrics and add it into the int_mtd_vs_previous_year_metrics model. Make int_core__aggregated_metrics read Accommodation metrics from this table
  3. Remove previous year comparison logic from int_core__mtd_deal_metrics and add it into the int_mtd_vs_previous_year_metrics model. Make int_core__aggregated_metrics read Deal metrics from this table
  4. Remove previous year comparison logic from int_core__mtd_guest_journey_metrics and add it into the int_mtd_vs_previous_year_metrics model. Make int_core__aggregated_metrics read Guest Journey metrics from this table
  5. Remove previous year comparison logic from int_mtd_guest_revenue_metrics and add it into the int_mtd_vs_previous_year_metrics model. Make int_core__aggregated_metrics read Guest Revenue metrics from this table
  6. Transition int_core__mtd_aggregated_metrics and int_core__monthly_aggregated_metrics_history_by_deal into int_mtd_aggregated_metrics and int_monthly_aggregated_metrics_history_by_deal respectively. Make the respective reporting models read from the new cross models.
  7. Create the reporting cross models and re-do the reporting
  8. Once validated, deploy the Power BI reading from cross reporting.
  9. Once core__monthly_aggregated_metrics_history_by_deal and core__mtd_aggregated_metrics have no downstream dependants at all, remove them.