# Description Creates skeleton for new KPIs data flow for created_bookings metric. Details are accessible [here](https://www.notion.so/knowyourguest-superhog/KPIs-Refactor-Let-s-go-daily-2024-10-23-1280446ff9c980dc87a3dc7453e95f06?pvs=4#12a0446ff9c98085bf4dfc77f6fc22f7) In essence: * Models are created in intermediate in a kpis folder. * Models have a daily segmentation. This includes `created_bookings` models, but also the daily lifecycle per listing and the segmentation. It also adds a `dimension_dates` model specific for KPIs. These have all the dimensions already in place and handle all the crazy logic. * Other time aggregation models simply read from existing daily models which are much easier (`int_kpis__metric_mtd_created_bookings` and `int_kpis__metric_monthly_created_bookings`). * Dimensionality aggregation can be easily added within a given timeframe (daily, mtd, monthly). For instance, I do it for mtd in the `int_kpis__aggregated_mtd_created_bookings` and for monthly in `int_kpis__aggregated_monthly_created_bookings` * Macro configuration for dimensions: Allows to set any specific dimension for `aggregated` models. By default, the subset of global, by billing country, by number of listings and by deal apply - since these are needed for Main KPIs. I added an example with Dash Source, that currently does not exist and it's currently configured as only appearing for created bookings. * Testing `aggregated` models completeness. A new macro called `assert_dimension_completeness` is available that ensures additive metrics are consistent vs. the global result, configurable at schema level. * Testing refactor impact. I'm aware that changing the lifecycle model to daily impacts the volumes for listing segments. For the rest, I added a `tmp` test that checks that the dimension and dimension value per date exactly match comparing new vs. old computation. Latest edits: * Changed naming convention * Split of MTD and Monthly. Now these are 2 different entities, as stated in `int_kpis__dimension_dates`. * Added start_date and end_date for models that contemplate a range (mtd, monthly). * Added a small readme entry in the kpis folders. Mostly it states nomenclature and some first conventions. Dbt docs:  # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. **Likely we'll be able to add macros for mtd and dim_agg models. We will see later on.** - [ ] I've picked the right materialization for the affected models. **Models run ok except for the daily lifecycle of listings, which lasts several minutes in the first run. Model curr...
2.5 KiB
2.5 KiB
KPIs Readme
Purpose
The \kpis folder is dedicated to KPIs modelisation, which include mostly any relevant dimension and measure and time aggregation needed for transforming data into business metrics.
Convention
Model names
-
Any model within the folder
intermediate\kpisneeds to follow this convention:int_kpis__{structure_type}_{time_dimension}_{relevant_entity_name}. -
Structure types can be the following:
lifecycle: any modelisation that classifies certain behavior on a given entity that can vary over time. For instance, the listing lifecycle in terms of booking creation could categorise the lifecycle of the listing based on whether a listing being new, active, never booked, inactive, etc.dimension: any modelisation that allows to segment or categorise data, so it can provide descriptive context for the measures. Segments resulting from lifecycles would likely have an equivalent dimension model.metric: any model that computes a given metric per different dimensions that is not aggregated. This means that each dimension will have a dedicated column within the model.aggregated: a model that aggregates the data into a 1) date range, 2) a dimension and 3) a dimension value for any given metric. These will always depend on metrics models.
-
Time dimension can be the following:
daily: if the time granularity is dailymonthly: if the time granularity is monthly, meaning metrics are aggregated to the monthmtd: if the time granularity is month-to-date, meaning metrics are cumulative to a certain date of the current month and so it's the case for the same days on the month of the previous days.- others.
-
Relevant entitity needs to easily and uniquely identify the entity being modelled, such as Created Bookings.
-
The only exception is
int_kpis__dimension_dates, that even though is granular at daily level, it's simplified on purpose.
Logic
- The model that contains the deepest granularity for each entity should be the one handling the data gathering to compute raw metrics and dimensions. Likely, this model will be in the form of
int_kpis__metric_daily_{relevant_entity_name}. In this case, joins outside of thekpisfolder are accepted and expected. - Downstream models, indistinctly of these being
metricoraggregatedmodels, should not join with other models outside of thekpisfolder. - Downstream models could eventually join with other models within the
kpisfolder in order to create weighted or converted metrics.