sh-notion/notion_data_team_no_files/(Legacy) Technical Documentation - 2024-08-05 aa7e1cf16b6e410b86ee0787a195be48.md
Pablo Martin a256b48b01 pages
2025-07-11 16:15:17 +02:00

116 lines
No EOL
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# (Legacy) Technical Documentation - 2024-08-05
This documentation follows a top-down approach. We start with what is visible to the users through PBI and we go backwards to the details of how things are structured and computed within DWH. This way we keep the overall image of the project before jumping into the details of it.
**Table of contents**
# Power BI Reporting
## Overview
We have a single report for Business KPIs at this stage. Its Main KPIs and its published in Business Overview. [Link to the repository here](https://guardhog.visualstudio.com/Data/_git/data-pbi-reports?path=/reports/business_overview_main_kpi).
The reporting contains 2 manners of seeing KPIs: Global KPIs and KPIs by Deal. The mapping of the KPIs per report page is the following:
- Global: MTD, Monthly Overview, Evolution over Time
- by Deal: Detail by Deal, Deal Comparison
Additionally, the reporting contains a Readme page with detailed explanation of each tab. Lastly, the report contains a Data Glossary that specifies how metrics are computed and if theres any data quality issue around some metrics.
## Data Sources
Since theres 2 ways of visualising KPIs, Global and by Deal, this report contains 2 sources. These are, in Reporting:
- Global: `mtd_aggregated_metrics`
- by Deal: `monthly_aggregated_metrics_history_by_deal`
![Untitled](Untitled%203.png)
Note the convention that follows. Both contain the `aggregated_metrics`, meaning at this stage metrics from different sources are aggregated within these 2 models. The main differences between these 2 are the fact that the KPIs by Deal are stated to be considered at `monthly_history_by_deal` level, while Global KPIs are `mtd` (month to date). This is on purpose and has consequences on how the KPIs are computed.
Lets take a look at how these models look like:
For Global KPIs, `mtd_aggregated_metrics`:
![Untitled](Untitled%2010.png)
**For each date and each metric**, we have the `value`, `previous year value` and the `relative increment` between value and previous year value. Some other fields that are important are the number format, that will impact on how the metric is formatted within Power BI and order by, that will impact on how it is ordered within the visualisation of the KPIs, specially in the MTD tab. Lastly, the dates that are displayed are either the last day of historical months OR any day of the current month, for MTD purposes.
For KPIs by deal, `monthly_aggregated_metrics_history_by_deal`
![Untitled](Untitled%2011.png)
**For each date and each id_deal**, we have only the **values of each metric in separated columns**. Note that this is not aggregated at metric level as the MTD part, and theres also not any previous year value or relative increment. This impacts on how the intermediate aggregations are handled.
# Global vs. By Deal KPIs computation
## Global KPIs schema
![Untitled](Untitled%2012.png)
## KPIs by Deal schema
![Untitled](Untitled%2013.png)
Heres the main goals of each stage, similarities and differences to be taken into account:
- **Reporting**:
- **Goal**: materialise and expose the data that is going to be available for users.
- **Similarities**
- Both flows have a table in reporting that exposes the information for PBI usage.
- **Differences**
- The by Deal part is a replica of what is available in intermediate. However, for Global is not exactly the case, since in `mtd_aggregated_metrics` we force the exclusion of Xero-based metrics for the current month and the previous one. This is to 1) avoid displaying partial invoicing data thus affecting figures such as revenue while 2) ensure within DWH all data is up-to-date, even if the invoicing cycle has not finalised. You can find the exclusion condition [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=22&lineEnd=23&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
- The naming convention, as explained before, because of how KPIs are computed and how the information is displayed in these 2 models (see Data Sources of previous paragraph)
- **Aggregation**:
- **Goal**: aggregates different sources of metrics data into a single model before exposing it.
- **Similarities**
- Both flows have a previous step in intermediate, before reporting, that contains the final computation of KPIs, namely `int_mtd_aggregated_metrics` and `monthly_aggregated_metrics_by_deal`.
- **Differences**
- The Global KPIs have two steps:
- `int_mtd_vs_previous_year_metrics`: ensures the [plain combination of the sources + the computation of derives metrics](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=27&lineEnd=28&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) AND [the computation vs. previous year by auto-joining the combined CTE](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=187&lineEnd=188&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
- `int_mtd_aggregated_metrics`: ensures the unpivot display i.e., all different metrics are aggregated into a metrics column. [Here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=1&lineEnd=2&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) we also specify the fields of the number format, order by and which name tag (metric) corresponds to each value, previous year value and relative increment.
- The KPIs by Deal have just one step:
- `int_monthly_aggregated_metrics_history_by_deal` only handles the [plain combination of the soruces + the computation of derived metrics](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_aggregated_metrics_history_by_deal.sql) on the By Deal basis.
- **Sources**:
- **Goal**: Handle all specific logic for retrieving each metric from intermediate master tables.
- **Similarities**
- All metrics depending on the same sources are encapsulated within each source model.
- All follow a strategy of logic computation within each CTE ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__mtd_guest_journey_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=26&lineEnd=27&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents), [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__mtd_guest_payments_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=17&lineEnd=18&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)) with a final aggregation of a date model with left join on the different CTEs ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__monthly_guest_payments_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=80&lineEnd=81&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)). See links for some example.
- **Differences**:
- Global models need to force a join with `int_dates_mtd` in each CTE to allow for the aggregation of the metric up to a certain day in the past, for MTD purposes. This is highly consuming in resources, thus since its not needed in the By Deal models, you dont actually need to join with the `int_dates_by_deal` in the CTEs, but only in the final aggregation.
- By Deal models need to have a Deal. This means that sometimes, since Deal is not available in a source model (ex: in Guest Journeys - verification_requests table theres no deal), theres additional joins to retrieve the id deal. This is not needed for Global models thus simplifying the computation.
- **Dates**:
- **Goal**: Provide an empty date framework that serves as the skeleton of the needed dates/granularity for each KPI type.
- **Similarities**:
- Each KPI visualisation type, Global and by Deal, have a unique dependency on a Date model.
- **Differences**:
- The `int_dates_mtd` only contains dates and allows for the MTD aggregation ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_dates_mtd.sql)) while the `int_dates_by_deal` contains the Deal aggregation - by deal suffix - while does not allow for the MTD aggregation - does not contain a mtd prefix ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_dates_by_deal.sql)).
# How to create a new metric?
Follow these steps:
1. Identify if the metric is Global, by Deal or both. Likely its both, except if youre doing some Deal-based metric by Deal that might not make sense. This will clarify if you need to modify 1 of the branches or both of them.
2. Identify the source of your metric. From here we can have different possibilities:
1. If for instance, the metric is related to Bookings, you might want to add it in the `int_core__mtd_booking_metrics` and `int_core__monthly_bookings_history_by_deal`. Similar rationality can apply for Guest Journeys, Invoicing, Guest Payments, Listings, etc.
2. If the metric “type” does not exist yet, such as implementing a Hubspot-based client onboarding opportunities metrics, ideally youd create a standalone model by replicating the structure of an already existing source model. Copy-paste and adapt 🙂
3. If your metric is a combination of two or more different sources, such as Total Revenue by Booking Cancelled, you will need to understand if the submetrics are already available or not. If yes, you can skip this part, if not, go to point a) or b). If its a derived metrics within the same source, such as Guest Journey with Payment per Guest Journey Created, you can directly add it in `int_core__mtd_guest_journey_metrics` and `int_core__monthly_guest_journey_history_by_deal`.
3. Propagate to intermediate aggregations. Lets split Global and Deal based:
1. Global KPIs:
1. Reference your newly created metric in the plain combination of sources in the `int_mtd_vs_previous_year_metrics`. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar `nullif(coalesce(x,0)+colaesce(y,0),0)` structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=110&lineEnd=111&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
2. Use the macro `calculate_safe_relative_increment` to compute the value, previous_year_value and relative_increment in the final query ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=187&lineEnd=188&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)).
2. KPIs by Deal:
1. Reference your newly created metric in the plain combination of sources in the `int_monthly_aggregated_metrics_history_by_deal`. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar `nullif(coalesce(x,0)+colaesce(y,0),0)` structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_aggregated_metrics_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=95&lineEnd=96&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
4. Exposure of metrics. Lets split Global and Deal based:
1. Global KPIs:
1. Add the configuration of your new metric in `int_mtd_aggregated_metrics`. Youll need to parametrise the order, metric (name tag that will be displayed in the reporting), the number format (for formatting in the reporting) and which values is going to use. Order by is informative so you can actually replicate an existing one, although I recommend to choose a value not being used so its clearer how we want to order the KPIs. **Important: keep in mind that merging and refreshing this will directly make this metric available and visible in the dashboard.**
2. If your metric is or uses an invoicing metric that should not be displayed in the current month or the previous month, validate that the [condition applied](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=38&lineEnd=39&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) in the reporting file of `mtd_aggregated_metrics` works well.
3. Modify Data Glossary to include the description of your new metric. Note that theres no additional need to change anything else on the Power BI for Global metrics.
2. Deal KPIs:
1. Propagate the new metric from `int_monthly_aggregated_metrics_history_by_deal` to `monthly_aggregated_metrics_history_by_deal`. If this metric is or uses an invoicing metric, please use the macro `is_date_before_previous_month`. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/monthly_aggregated_metrics_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=31&lineEnd=32&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
2. In Power BI, once the model in reporting has been refreshed, you will need to manually add the new metrics in the tabs: Detail by Deal and Deal Comparison. For each new metric, in PBI, you will need to manually specify the number format, the order of display and the name of the metric.
# Additional notes
1. Youve seen that the two ways of displaying data at this stage are not consistent - beyond the fact of having the granularity of Deal or not. It has some pros and cons and this changes the way of how to create a new metric. Global is much more DWH dependant, while By Deal needs more PBI modifications.
2. At this stage, we want to implement metrics by different dimensions, and this is actually complicated to generalise within the current setup. Were investigating a more scalable solution called MetricFlow that could potentially modify completely this structured that has been presented in this Notion page.