sh-notion/notion_data_team_no_files/(Legacy) Technical Documentation - 2024-08-05 aa7e1cf16b6e410b86ee0787a195be48.md
Pablo Martin a256b48b01 pages
2025-07-11 16:15:17 +02:00

15 KiB
Raw Permalink Blame History

(Legacy) Technical Documentation - 2024-08-05

This documentation follows a top-down approach. We start with what is visible to the users through PBI and we go backwards to the details of how things are structured and computed within DWH. This way we keep the overall image of the project before jumping into the details of it.

Table of contents

Power BI Reporting

Overview

We have a single report for Business KPIs at this stage. Its Main KPIs and its published in Business Overview. Link to the repository here.

The reporting contains 2 manners of seeing KPIs: Global KPIs and KPIs by Deal. The mapping of the KPIs per report page is the following:

  • Global: MTD, Monthly Overview, Evolution over Time
  • by Deal: Detail by Deal, Deal Comparison

Additionally, the reporting contains a Readme page with detailed explanation of each tab. Lastly, the report contains a Data Glossary that specifies how metrics are computed and if theres any data quality issue around some metrics.

Data Sources

Since theres 2 ways of visualising KPIs, Global and by Deal, this report contains 2 sources. These are, in Reporting:

  • Global: mtd_aggregated_metrics
  • by Deal: monthly_aggregated_metrics_history_by_deal

Untitled

Note the convention that follows. Both contain the aggregated_metrics, meaning at this stage metrics from different sources are aggregated within these 2 models. The main differences between these 2 are the fact that the KPIs by Deal are stated to be considered at monthly_history_by_deal level, while Global KPIs are mtd (month to date). This is on purpose and has consequences on how the KPIs are computed.

Lets take a look at how these models look like:

For Global KPIs, mtd_aggregated_metrics:

Untitled

For each date and each metric, we have the value, previous year value and the relative increment between value and previous year value. Some other fields that are important are the number format, that will impact on how the metric is formatted within Power BI and order by, that will impact on how it is ordered within the visualisation of the KPIs, specially in the MTD tab. Lastly, the dates that are displayed are either the last day of historical months OR any day of the current month, for MTD purposes.

For KPIs by deal, monthly_aggregated_metrics_history_by_deal

Untitled

For each date and each id_deal, we have only the values of each metric in separated columns. Note that this is not aggregated at metric level as the MTD part, and theres also not any previous year value or relative increment. This impacts on how the intermediate aggregations are handled.

Global vs. By Deal KPIs computation

Global KPIs schema

Untitled

KPIs by Deal schema

Untitled

Heres the main goals of each stage, similarities and differences to be taken into account:

  • Reporting:
    • Goal: materialise and expose the data that is going to be available for users.
    • Similarities
      • Both flows have a table in reporting that exposes the information for PBI usage.
    • Differences
      • The by Deal part is a replica of what is available in intermediate. However, for Global is not exactly the case, since in mtd_aggregated_metrics we force the exclusion of Xero-based metrics for the current month and the previous one. This is to 1) avoid displaying partial invoicing data thus affecting figures such as revenue while 2) ensure within DWH all data is up-to-date, even if the invoicing cycle has not finalised. You can find the exclusion condition here.
      • The naming convention, as explained before, because of how KPIs are computed and how the information is displayed in these 2 models (see Data Sources of previous paragraph)
  • Aggregation:
  • Sources:
    • Goal: Handle all specific logic for retrieving each metric from intermediate master tables.
    • Similarities
      • All metrics depending on the same sources are encapsulated within each source model.
      • All follow a strategy of logic computation within each CTE (here, here) with a final aggregation of a date model with left join on the different CTEs (here). See links for some example.
    • Differences:
      • Global models need to force a join with int_dates_mtd in each CTE to allow for the aggregation of the metric up to a certain day in the past, for MTD purposes. This is highly consuming in resources, thus since its not needed in the By Deal models, you dont actually need to join with the int_dates_by_deal in the CTEs, but only in the final aggregation.
      • By Deal models need to have a Deal. This means that sometimes, since Deal is not available in a source model (ex: in Guest Journeys - verification_requests table theres no deal), theres additional joins to retrieve the id deal. This is not needed for Global models thus simplifying the computation.
  • Dates:
    • Goal: Provide an empty date framework that serves as the skeleton of the needed dates/granularity for each KPI type.
    • Similarities:
      • Each KPI visualisation type, Global and by Deal, have a unique dependency on a Date model.
    • Differences:
      • The int_dates_mtd only contains dates and allows for the MTD aggregation (here) while the int_dates_by_deal contains the Deal aggregation - by deal suffix - while does not allow for the MTD aggregation - does not contain a mtd prefix (here).

How to create a new metric?

Follow these steps:

  1. Identify if the metric is Global, by Deal or both. Likely its both, except if youre doing some Deal-based metric by Deal that might not make sense. This will clarify if you need to modify 1 of the branches or both of them.
  2. Identify the source of your metric. From here we can have different possibilities:
    1. If for instance, the metric is related to Bookings, you might want to add it in the int_core__mtd_booking_metrics and int_core__monthly_bookings_history_by_deal. Similar rationality can apply for Guest Journeys, Invoicing, Guest Payments, Listings, etc.
    2. If the metric “type” does not exist yet, such as implementing a Hubspot-based client onboarding opportunities metrics, ideally youd create a standalone model by replicating the structure of an already existing source model. Copy-paste and adapt 🙂
    3. If your metric is a combination of two or more different sources, such as Total Revenue by Booking Cancelled, you will need to understand if the submetrics are already available or not. If yes, you can skip this part, if not, go to point a) or b). If its a derived metrics within the same source, such as Guest Journey with Payment per Guest Journey Created, you can directly add it in int_core__mtd_guest_journey_metrics and int_core__monthly_guest_journey_history_by_deal.
  3. Propagate to intermediate aggregations. Lets split Global and Deal based:
    1. Global KPIs:
      1. Reference your newly created metric in the plain combination of sources in the int_mtd_vs_previous_year_metrics. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar nullif(coalesce(x,0)+colaesce(y,0),0) structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example here.
      2. Use the macro calculate_safe_relative_increment to compute the value, previous_year_value and relative_increment in the final query (here).
    2. KPIs by Deal:
      1. Reference your newly created metric in the plain combination of sources in the int_monthly_aggregated_metrics_history_by_deal. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar nullif(coalesce(x,0)+colaesce(y,0),0) structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example here.
  4. Exposure of metrics. Lets split Global and Deal based:
    1. Global KPIs:
      1. Add the configuration of your new metric in int_mtd_aggregated_metrics. Youll need to parametrise the order, metric (name tag that will be displayed in the reporting), the number format (for formatting in the reporting) and which values is going to use. Order by is informative so you can actually replicate an existing one, although I recommend to choose a value not being used so its clearer how we want to order the KPIs. Important: keep in mind that merging and refreshing this will directly make this metric available and visible in the dashboard.
      2. If your metric is or uses an invoicing metric that should not be displayed in the current month or the previous month, validate that the condition applied in the reporting file of mtd_aggregated_metrics works well.
      3. Modify Data Glossary to include the description of your new metric. Note that theres no additional need to change anything else on the Power BI for Global metrics.
    2. Deal KPIs:
      1. Propagate the new metric from int_monthly_aggregated_metrics_history_by_deal to monthly_aggregated_metrics_history_by_deal. If this metric is or uses an invoicing metric, please use the macro is_date_before_previous_month. Example here.
      2. In Power BI, once the model in reporting has been refreshed, you will need to manually add the new metrics in the tabs: Detail by Deal and Deal Comparison. For each new metric, in PBI, you will need to manually specify the number format, the order of display and the name of the metric.

Additional notes

  1. Youve seen that the two ways of displaying data at this stage are not consistent - beyond the fact of having the granularity of Deal or not. It has some pros and cons and this changes the way of how to create a new metric. Global is much more DWH dependant, while By Deal needs more PBI modifications.
  2. At this stage, we want to implement metrics by different dimensions, and this is actually complicated to generalise within the current setup. Were investigating a more scalable solution called MetricFlow that could potentially modify completely this structured that has been presented in this Notion page.