8.6 KiB
Technical Documentation - 2024-11-12
Purpose
The intermediate/kpis folder is dedicated to KPIs modelisation, which include mostly any relevant dimension, measure and time aggregation needed for transforming data into business metrics. As Data Team, we should provide the maximum possible quality of KPIs.
Convention
Model names
- Any model within the folder
intermediate/kpisneeds to follow this convention:int_kpis__{structure_type}_{time_dimension}_{relevant_entity_name}. - Structure types can be the following:
lifecycle: any modelling that classifies certain behavior on a given entity that can vary over time. For instance, the listing lifecycle in terms of booking creation could categorise the lifecycle of the listing based on whether a listing being new, active, never booked, inactive, etc.dimension: any modelling that allows to segment or categorise data, so it can provide descriptive context for the measures. Segments resulting from lifecycles would likely have an equivalent dimension model.metric: any model that computes a given metric per different dimensions that is not aggregated. This means that each dimension will have a dedicated column within the model.agg: a model that aggregates the data into a 1) date range, 2) a dimension and 3) a dimension value for any given metric. These will always depend on metrics models.
- Time dimension can be the following:
daily: if the time granularity is dailymonthly: if the time granularity is monthly, meaning metrics are aggregated to the monthmtd: if the time granularity is month-to-date, meaning metrics are cumulative to a certain date of the current month and so it's the case for the same days on the month of the previous days.- others.
Relevant entity nameneeds to easily and uniquely identify the entity being modelled, such as Created Bookings.- The only exception is
int_kpis__dimension_dates, that even though is granular at daily level, it's simplified on purpose, to avoid the model beingint_kpis__dimension_daily_dates.
Logic
- The model that contains the deepest granularity for each entity should be the one handling the data gathering to compute raw metrics and dimensions. Likely, this model will be in the form of
int_kpis__metric_daily_{relevant_entity_name}. In this case, joins outside of thekpisfolder are accepted and expected in order to gather dimensions and metrics. - Downstream models within
kpisfolder, indistinctly of these beingmetricoraggregatedmodels, should not join with other models outside of thekpisfolder. Further enrichment can be done with outside models as long as the resulting models are directly located outside thekpisfolder, namely into cross/general folders. - Downstream models within
kpisfolder could eventually join with other models within thekpisfolder in order to create weighted or converted metrics.
Dimension aggregation
Models that are dimensions aggregates, namely aggregated or agg models, follow a common pattern of date, dimension and dimension_value.
For models that are not daily, such as monthly or mtd, date is substituted by a time range defined within start_date and end_date. Generally, end_date is part of the primary key alongside dimension and dimension_value, while start_date is only displayed for information purposes.
In order to specify which dimensions are considered to be retrieved for each aggregate model, we use the get_kpi_dimensions_per_model macro. This macro only takes as argument the name of the entity that we’re modelling, such as CREATED_BOOKINGS.
By default, the macro will consider the following base dimensions as the expected ones:
globalby_billing_countryby_number_of_listings
Generally, any model will also receive the by_deal dimension unless strictly removed in the macro configuration. Additional entity-specific dimensions can be configured for the aggregation. For instance, GUEST_PAYMENTS can receive both the 4 abovementioned dimension aggregations as well as by_has_id_check as it’s required for other purposes.
Lastly, be aware that when creating a new dimension, you’d need to create a dedicated macro entry by the name of dim_{name_of_your_dimension}, that should provide 1) the dimension name to be used and 2) the field that contains the dimension_value used to compute the aggregation.
KPIs Products
This is a summary of the Data Products that depend on the KPIs.
Main KPIs
Reporting: Main KPIs
Data Product page: Business Overview Reporting Suite
Computation flows:
→ Note that these are shared within KPIs folder, and get split at cross level.
- Name:
MTD + Monthly per category- Downstream tables:
cross/int_mtd_vs_previous_year_metrics- In turn, this depends on
cross/int_monthly_aggregated_metrics_history_by_dealdue to the computation of Churn Rate metrics, that are deal-dependant.
- In turn, this depends on
cross/int_mtd_aggregated_metricsgeneral/mtd_aggregated_metrics
- Time dimensions used:
Monthly(depends on daily)MTD(depends on daily)
- Dimensions used:
globalby_billing_countryby_number_of_listings
- Entities used:
Created BookingsCheck Out BookingsCancelled BookingsBillable BookingsCreated Guest JourneysStarted Guest JourneysCompleted Guest JourneysGuest Journeys with PaymentGuest PaymentsInvoiced RevenueHost ResolutionsListingsDeals
- Depends on:
- Flow:
Monthly by Deal- Table:
cross/int_monthly_aggregated_metrics_history_by_deal
- Table:
- Flow:
- Downstream tables:
- Name:
Monthly by Deal- Downstream tables:
cross/int_monthly_aggregated_metrics_history_by_dealgeneral/monthly_aggregated_metrics_history_by_deal
- Time dimensions used:
Monthly(depends on daily)
- Dimensions used:
by_deal
- Entities used:
Created BookingsCheck Out BookingsCancelled BookingsBillable BookingsCreated Guest JourneysStarted Guest JourneysCompleted Guest JourneysGuest Journeys with PaymentGuest PaymentsInvoiced RevenueHost ResolutionsListings
- Downstream tables:
Account Managers Reporting
Reporting: Account Managers Reporting
Data Product page: Account Management Reporting Suite
Computation flows:
- Name:
growth score by deal-
Downstream tables:
cross/int_monthly_growth_score_by_dealgeneral/monthly_growth_score_by_deal
-
Time dimensions used:
Monthly(depends on daily)
-
Dimensions used:
by_deal
-
Entities used:
→ At this stage, uses the same as Monthly by Deal from Main KPIs. In terms of pure business sense, it would only use:
Created BookingsGuest PaymentsInvoiced RevenueListings
-
Depends on:
- Flow:
Monthly by Deal(Main KPIs)- Table:
cross/int_monthly_aggregated_metrics_history_by_deal
- Table:
- Flow:
-
- Name:
monthly aggregated metrics history by deal by time window-
Downstream tables:
cross/int_monthly_aggregated_metrics_history_by_deal_by_time_windowgeneral/monthly_aggregated_metrics_history_by_deal_by_time_window
-
Time dimensions used:
Monthly(depends on daily. It aggregates different months to generate larger aggregations, ex.: Previous 6 months).
-
Dimensions used:
by_deal
-
Entities used:
→ At this stage, uses the same as Monthly by Deal from Main KPIs. In terms of pure business sense, it would only use:
Created BookingsGuest PaymentsInvoiced RevenueHost ResolutionsListings
-
Depends on:
- Flow:
Monthly by Deal(Main KPIs)- Table:
cross/int_monthly_aggregated_metrics_history_by_deal
- Table:
- Flow:
-