From 875f91be2669c03f532ec5fe04170296c02e08e2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Oriol=20Roqu=C3=A9=20Paniagua?= Date: Wed, 30 Oct 2024 08:55:19 +0000 Subject: [PATCH] Merged PR 3329: First version of KPIs refactored - created bookings # Description Creates skeleton for new KPIs data flow for created_bookings metric. Details are accessible [here](https://www.notion.so/knowyourguest-superhog/KPIs-Refactor-Let-s-go-daily-2024-10-23-1280446ff9c980dc87a3dc7453e95f06?pvs=4#12a0446ff9c98085bf4dfc77f6fc22f7) In essence: * Models are created in intermediate in a kpis folder. * Models have a daily segmentation. This includes `created_bookings` models, but also the daily lifecycle per listing and the segmentation. It also adds a `dimension_dates` model specific for KPIs. These have all the dimensions already in place and handle all the crazy logic. * Other time aggregation models simply read from existing daily models which are much easier (`int_kpis__metric_mtd_created_bookings` and `int_kpis__metric_monthly_created_bookings`). * Dimensionality aggregation can be easily added within a given timeframe (daily, mtd, monthly). For instance, I do it for mtd in the `int_kpis__aggregated_mtd_created_bookings` and for monthly in `int_kpis__aggregated_monthly_created_bookings` * Macro configuration for dimensions: Allows to set any specific dimension for `aggregated` models. By default, the subset of global, by billing country, by number of listings and by deal apply - since these are needed for Main KPIs. I added an example with Dash Source, that currently does not exist and it's currently configured as only appearing for created bookings. * Testing `aggregated` models completeness. A new macro called `assert_dimension_completeness` is available that ensures additive metrics are consistent vs. the global result, configurable at schema level. * Testing refactor impact. I'm aware that changing the lifecycle model to daily impacts the volumes for listing segments. For the rest, I added a `tmp` test that checks that the dimension and dimension value per date exactly match comparing new vs. old computation. Latest edits: * Changed naming convention * Split of MTD and Monthly. Now these are 2 different entities, as stated in `int_kpis__dimension_dates`. * Added start_date and end_date for models that contemplate a range (mtd, monthly). * Added a small readme entry in the kpis folders. Mostly it states nomenclature and some first conventions. Dbt docs: ![image (5).png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/3329/attachments/image%20%285%29.png) # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [ ] I have checked for DRY opportunities with other models and docs. **Likely we'll be able to add macros for mtd and dim_agg models. We will see later on.** - [ ] I've picked the right materialization for the affected models. **Models run ok except for the daily lifecycle of listings, which lasts several minutes in the first run. Model curr... --- macros/business_kpis_configuration.sql | 91 ++- .../tests/assert_dimension_completeness.sql | 21 + models/intermediate/kpis/_kpis_readme.md | 29 + ...s__aggregated_monthly_created_bookings.sql | 24 + ..._kpis__aggregated_mtd_created_bookings.sql | 24 + ...nt_kpis__dimension_daily_accommodation.sql | 42 ++ .../kpis/int_kpis__dimension_dates.sql | 44 ++ ...nt_kpis__lifecycle_daily_accommodation.sql | 187 ++++++ ...nt_kpis__metric_daily_created_bookings.sql | 29 + ..._kpis__metric_monthly_created_bookings.sql | 29 + .../int_kpis__metric_mtd_created_bookings.sql | 30 + models/intermediate/kpis/schema.yml | 554 ++++++++++++++++++ ...s_refactor_equivalent_created_bookings.sql | 52 ++ 13 files changed, 1149 insertions(+), 7 deletions(-) create mode 100644 macros/tests/assert_dimension_completeness.sql create mode 100644 models/intermediate/kpis/_kpis_readme.md create mode 100644 models/intermediate/kpis/int_kpis__aggregated_monthly_created_bookings.sql create mode 100644 models/intermediate/kpis/int_kpis__aggregated_mtd_created_bookings.sql create mode 100644 models/intermediate/kpis/int_kpis__dimension_daily_accommodation.sql create mode 100644 models/intermediate/kpis/int_kpis__dimension_dates.sql create mode 100644 models/intermediate/kpis/int_kpis__lifecycle_daily_accommodation.sql create mode 100644 models/intermediate/kpis/int_kpis__metric_daily_created_bookings.sql create mode 100644 models/intermediate/kpis/int_kpis__metric_monthly_created_bookings.sql create mode 100644 models/intermediate/kpis/int_kpis__metric_mtd_created_bookings.sql create mode 100644 models/intermediate/kpis/schema.yml create mode 100644 tests/tmp_kpis_refactor_equivalent_created_bookings.sql diff --git a/macros/business_kpis_configuration.sql b/macros/business_kpis_configuration.sql index fe3956c..e41f317 100644 --- a/macros/business_kpis_configuration.sql +++ b/macros/business_kpis_configuration.sql @@ -5,12 +5,17 @@ Provides a general configuration for the Dimensions available for the KPIs. Please note that strings should be encoded with " ' your_value_here ' ", while fields from tables should be specified like " your_field_here " */ - {% macro get_kpi_dimensions() %} {% set dimensions = [ {"dimension": "'global'", "dimension_value": "'global'"}, - {"dimension": "'by_number_of_listings'", "dimension_value": "active_accommodations_per_deal_segmentation"}, - {"dimension": "'by_billing_country'", "dimension_value": "main_billing_country_iso_3_per_deal"} + { + "dimension": "'by_number_of_listings'", + "dimension_value": "active_accommodations_per_deal_segmentation", + }, + { + "dimension": "'by_billing_country'", + "dimension_value": "main_billing_country_iso_3_per_deal", + }, ] %} {{ return(dimensions) }} {% endmacro %} @@ -25,12 +30,84 @@ This configuration ensures that working with new dimensions won't affect the dis Additionally, it provides a proper display name for reporting purposes. */ - {% macro get_kpi_dimensions_for_production() %} {% set dimensions = [ {"dimension": "'global'", "dimension_display": "'Global'"}, - {"dimension": "'by_number_of_listings'", "dimension_display": "'By # of Listings Booked in 12 Months'"}, - {"dimension": "'by_billing_country'", "dimension_display": "'By Billing Country'"} + { + "dimension": "'by_number_of_listings'", + "dimension_display": "'By # of Listings Booked in 12 Months'", + }, + { + "dimension": "'by_billing_country'", + "dimension_display": "'By Billing Country'", + }, ] %} {{ return(dimensions) }} -{% endmacro %} \ No newline at end of file +{% endmacro %} + +/* +The following lines specify for each dimension the field to be used in a +standalone macro. +Please note that strings should be encoded with " ' your_value_here ' ", + while fields from tables should be specified like " your_field_here " +*/ +{% macro dim_global() %} + {{ return({"dimension": "'global'", "dimension_value": "'global'"}) }} +{% endmacro %} +{% macro dim_billing_country() %} + {{ + return( + { + "dimension": "'by_billing_country'", + "dimension_value": "main_billing_country_iso_3_per_deal", + } + ) + }} +{% endmacro %} +{% macro dim_number_of_listings() %} + {{ + return( + { + "dimension": "'by_number_of_listings'", + "dimension_value": "active_accommodations_per_deal_segmentation", + } + ) + }} +{% endmacro %} +{% macro dim_deal() %} + {{ return({"dimension": "'by_deal'", "dimension_value": "id_deal"}) }} +{% endmacro %} +{% macro dim_dash() %} + {{ return({"dimension": "'by_dash_source'", "dimension_value": "dash_source"}) }} +{% endmacro %} + +/* +Macro: get_kpi_dimensions_per_model + +Provides a general assignemnt for the Dimensions available for each KPI + model. Keep in mind that these assignations need to be previously + declared. + +*/ +{% macro get_kpi_dimensions_per_model(entity_name) %} + + {# Base dimensions shared by all models #} + {% set base_dimensions = [ + dim_global(), + dim_number_of_listings(), + dim_billing_country(), + dim_deal(), + ] %} + + {# Initialize a list to hold any model-specific dimensions #} + {% set additional_dimensions = [] %} + + {# Add entity-specific dimensions #} + {% if entity_name == "CREATED_BOOKINGS" %} + {% set additional_dimensions = [dim_dash()] %} + {% endif %} + + {# Combine base dimensions with additional dimensions for the specific model #} + {% set dimensions = base_dimensions + additional_dimensions %} + {{ return(dimensions) }} +{% endmacro %} diff --git a/macros/tests/assert_dimension_completeness.sql b/macros/tests/assert_dimension_completeness.sql new file mode 100644 index 0000000..967bc28 --- /dev/null +++ b/macros/tests/assert_dimension_completeness.sql @@ -0,0 +1,21 @@ +{% test assert_dimension_completeness( + model, column_name, metric_column_name, global_dimension_name="global" +) %} + + with + sum_by_dimension as ( + select {{ column_name }}, sum({{ metric_column_name }}) as sum_metric + from {{ model }} + group by {{ column_name }} + ), + global_sum as ( + select sum({{ metric_column_name }}) as total_metric + from {{ model }} + where {{ column_name }} = '{{ global_dimension_name }}' + ) + + select * + from sum_by_dimension, global_sum + where sum_by_dimension.sum_metric != global_sum.total_metric + +{% endtest %} diff --git a/models/intermediate/kpis/_kpis_readme.md b/models/intermediate/kpis/_kpis_readme.md new file mode 100644 index 0000000..d3a8ee2 --- /dev/null +++ b/models/intermediate/kpis/_kpis_readme.md @@ -0,0 +1,29 @@ +# KPIs Readme + +## Purpose + +The `\kpis` folder is dedicated to KPIs modelisation, which include mostly any relevant dimension and measure and time aggregation needed for transforming data into business metrics. + +## Convention + +### Model names + +- Any model within the folder `intermediate\kpis` needs to follow this convention: `int_kpis__{structure_type}_{time_dimension}_{relevant_entity_name}`. +- Structure types can be the following: + - `lifecycle`: any modelisation that classifies certain behavior on a given entity that can vary over time. For instance, the listing lifecycle in terms of booking creation could categorise the lifecycle of the listing based on whether a listing being new, active, never booked, inactive, etc. + - `dimension`: any modelisation that allows to segment or categorise data, so it can provide descriptive context for the measures. Segments resulting from lifecycles would likely have an equivalent dimension model. + - `metric`: any model that computes a given metric per different dimensions that is not aggregated. This means that each dimension will have a dedicated column within the model. + - `aggregated`: a model that aggregates the data into a 1) date range, 2) a dimension and 3) a dimension value for any given metric. These will always depend on metrics models. + +- Time dimension can be the following: + - `daily`: if the time granularity is daily + - `monthly`: if the time granularity is monthly, meaning metrics are aggregated to the month + - `mtd`: if the time granularity is month-to-date, meaning metrics are cumulative to a certain date of the current month and so it's the case for the same days on the month of the previous days. + - others. +- Relevant entitity needs to easily and uniquely identify the entity being modelled, such as Created Bookings. +- The only exception is `int_kpis__dimension_dates`, that even though is granular at daily level, it's simplified on purpose. + +### Logic +- The model that contains the deepest granularity for each entity should be the one handling the data gathering to compute raw metrics and dimensions. Likely, this model will be in the form of `int_kpis__metric_daily_{relevant_entity_name}`. In this case, joins outside of the `kpis` folder are accepted and expected. +- Downstream models, indistinctly of these being `metric` or `aggregated` models, should not join with other models outside of the `kpis` folder. +- Downstream models could eventually join with other models within the `kpis` folder in order to create weighted or converted metrics. diff --git a/models/intermediate/kpis/int_kpis__aggregated_monthly_created_bookings.sql b/models/intermediate/kpis/int_kpis__aggregated_monthly_created_bookings.sql new file mode 100644 index 0000000..de2ddbd --- /dev/null +++ b/models/intermediate/kpis/int_kpis__aggregated_monthly_created_bookings.sql @@ -0,0 +1,24 @@ +{% set dimensions = get_kpi_dimensions_per_model("CREATED_BOOKINGS") %} + +{{ + config( + materialized="table", unique_key=["end_date", "dimension", "dimension_value"] + ) +}} + + +{% for dimension in dimensions %} + select + -- Unique Key -- + start_date, + end_date, + {{ dimension.dimension }} as dimension, + {{ dimension.dimension_value }} as dimension_value, + -- Metrics -- + sum(created_bookings) as created_bookings + from {{ ref("int_kpis__metric_monthly_created_bookings") }} + group by 1, 2, 3, 4 + {% if not loop.last %} + union all + {% endif %} +{% endfor %} diff --git a/models/intermediate/kpis/int_kpis__aggregated_mtd_created_bookings.sql b/models/intermediate/kpis/int_kpis__aggregated_mtd_created_bookings.sql new file mode 100644 index 0000000..0a347d7 --- /dev/null +++ b/models/intermediate/kpis/int_kpis__aggregated_mtd_created_bookings.sql @@ -0,0 +1,24 @@ +{% set dimensions = get_kpi_dimensions_per_model("CREATED_BOOKINGS") %} + +{{ + config( + materialized="table", unique_key=["end_date", "dimension", "dimension_value"] + ) +}} + + +{% for dimension in dimensions %} + select + -- Unique Key -- + start_date, + end_date, + {{ dimension.dimension }} as dimension, + {{ dimension.dimension_value }} as dimension_value, + -- Metrics -- + sum(created_bookings) as created_bookings + from {{ ref("int_kpis__metric_mtd_created_bookings") }} + group by 1, 2, 3, 4 + {% if not loop.last %} + union all + {% endif %} +{% endfor %} diff --git a/models/intermediate/kpis/int_kpis__dimension_daily_accommodation.sql b/models/intermediate/kpis/int_kpis__dimension_daily_accommodation.sql new file mode 100644 index 0000000..843b2a8 --- /dev/null +++ b/models/intermediate/kpis/int_kpis__dimension_daily_accommodation.sql @@ -0,0 +1,42 @@ +{{ config(materialized="table", unique_key=["date", "id_deal"]) }} +with + int_kpis__lifecycle_daily_accommodation as ( + select * from {{ ref("int_kpis__lifecycle_daily_accommodation") }} + ), + int_core__unique_accommodation_to_user as ( + select * from {{ ref("int_core__unique_accommodation_to_user") }} + ), + int_core__user_host as (select * from {{ ref("int_core__user_host") }}), + + active_accommodations_per_deal as ( + select + al.date, + uu.id_deal, + sum( + case when al.has_been_booked_within_last_12_months then 1 else 0 end + ) as accommodations_booked_in_12_months + from int_kpis__lifecycle_daily_accommodation al + inner join + int_core__unique_accommodation_to_user atu + on al.id_accommodation = atu.id_accommodation + inner join int_core__user_host uu on uu.id_user_host = atu.id_user_owner + where uu.id_deal is not null + group by 1, 2 + ) +select + date, + id_deal, + case + when accommodations_booked_in_12_months = 0 + then '0' + when accommodations_booked_in_12_months between 1 and 5 + then '01-05' + when accommodations_booked_in_12_months between 6 and 20 + then '06-20' + when accommodations_booked_in_12_months between 21 and 60 + then '21-60' + when accommodations_booked_in_12_months >= 61 + then '61+' + end as active_accommodations_per_deal_segmentation, + accommodations_booked_in_12_months +from active_accommodations_per_deal diff --git a/models/intermediate/kpis/int_kpis__dimension_dates.sql b/models/intermediate/kpis/int_kpis__dimension_dates.sql new file mode 100644 index 0000000..1e236cf --- /dev/null +++ b/models/intermediate/kpis/int_kpis__dimension_dates.sql @@ -0,0 +1,44 @@ +{{ config(materialized="table", unique_key="date") }} + +with + int_dates as ( + select * from {{ ref("int_dates") }} where date_day >= {{ var("start_date") }} + ), + raw_dates as ( + select + id.year_number as year, + id.month_of_year as month, + id.day_of_month as day, + id.date_day as date, + id.month_start_date as first_day_month, + id.month_end_date as last_day_month, + now()::date as today + from int_dates id + ) +select distinct + rd.year, + rd.month, + rd.day, + rd.date, + rd.first_day_month, + rd.last_day_month, + case when rd.date = rd.last_day_month then true else false end as is_end_of_month, + case + when date_trunc('month', rd.date) = date_trunc('month', rd.today) + then true + else false + end as is_current_month, + case + when date_trunc('month', rd.date) = date_trunc('month', rd.today) + then true + when + rd.year = extract(year from rd.today) - 1 + and rd.month = extract(month from rd.today) + and rd.day < extract(day from rd.today) + then true + else false + end as is_month_to_date +from raw_dates rd +where + -- include only up-to yesterday + rd.today > rd.date diff --git a/models/intermediate/kpis/int_kpis__lifecycle_daily_accommodation.sql b/models/intermediate/kpis/int_kpis__lifecycle_daily_accommodation.sql new file mode 100644 index 0000000..e111ddb --- /dev/null +++ b/models/intermediate/kpis/int_kpis__lifecycle_daily_accommodation.sql @@ -0,0 +1,187 @@ +{{ + config( + materialized="incremental", + unique_key=["date", "id_accommodation"], + incremental_strategy="merge", + ) +}} +with + int_core__bookings as (select * from {{ ref("int_core__bookings") }}), + stg_core__accommodation as (select * from {{ ref("stg_core__accommodation") }}), + int_core__unique_accommodation_to_user as ( + select * from {{ ref("int_core__unique_accommodation_to_user") }} + ), + int_kpis__dimension_dates as (select * from {{ ref("int_kpis__dimension_dates") }}), + + booked_days_per_accommodation as ( + select + id_accommodation, + created_date_utc, + lag(created_date_utc, 1) over ( + partition by id_accommodation order by created_date_utc asc + ) as previous_booked_date + from int_core__bookings + group by id_accommodation, created_date_utc + + ), + accommodation_historic_booking_dates as ( + select + d.date, + a.id_accommodation, + date(a.created_at_utc) as creation_date_utc, + min(b.created_date_utc) as first_time_booked_date_utc, + max(b.created_date_utc) as last_time_booked_date_utc, + max(b.previous_booked_date) as second_to_last_time_booked_date_utc + from int_kpis__dimension_dates d + inner join stg_core__accommodation a on d.date >= date(a.created_at_utc) + -- Inclusion of unique accommodation to user to force that the + -- accommodation needs to have a host assigned into it. + inner join + int_core__unique_accommodation_to_user uatu + on uatu.id_accommodation = a.id_accommodation + left join + booked_days_per_accommodation b + on uatu.id_accommodation = b.id_accommodation + and d.date >= b.created_date_utc + + {% if is_incremental() %} + where d.date >= (select current_date - interval '7 days') + {% endif %} + + group by d.date, a.id_accommodation, date(a.created_at_utc) + ), + accommodation_historic_booking_features as ( + select + ahbd.date, + ahbd.id_accommodation, + ahbd.creation_date_utc, + ahbd.first_time_booked_date_utc, + ahbd.last_time_booked_date_utc, + ahbd.second_to_last_time_booked_date_utc, + case + when ahbd.first_time_booked_date_utc is not null then true else false + end as accommodation_has_at_least_one_booking, + case + when + date_trunc('month', ahbd.creation_date_utc) + = date_trunc('month', ahbd.date) + then true + else false + end as accommodation_was_created_this_month, + case + when + date_trunc('month', ahbd.first_time_booked_date_utc) + = date_trunc('month', ahbd.last_time_booked_date_utc) + then true + else false + end as accommodation_has_first_booking, + case + when + date_trunc('month', ahbd.last_time_booked_date_utc) + = date_trunc('month', ahbd.date) + then true + else false + end as has_been_booked_within_current_month, + case + when + date_trunc('month', ahbd.last_time_booked_date_utc) + + interval '6 months' + > date_trunc('month', ahbd.date) + then true + else false + end as has_been_booked_within_last_6_months, + case + when + date_trunc('month', ahbd.last_time_booked_date_utc) + + interval '12 months' + > date_trunc('month', ahbd.date) + then true + else false + end as has_been_booked_within_last_12_months, + case + when + date_trunc('month', ahbd.last_time_booked_date_utc) + + interval '12 months' + = date_trunc('month', ahbd.date) + then true + else false + end as last_booking_was_12_months_ago, + case + when + date_trunc('month', ahbd.second_to_last_time_booked_date_utc) + + interval '12 months' + < date_trunc('month', ahbd.last_time_booked_date_utc) + then true + else false + end as had_previous_booking_more_than_12_months_before_the_last + from accommodation_historic_booking_dates ahbd + ) +select + date, + id_accommodation, + creation_date_utc, + first_time_booked_date_utc, + last_time_booked_date_utc, + second_to_last_time_booked_date_utc, + case + -- 01-New: The listing has been created this month and has not had any + -- booking + when + not accommodation_has_at_least_one_booking + and accommodation_was_created_this_month + then '01-New' + -- 02-Never Booked: The listing has been created before this month and + -- has not + -- had any booking + when + not accommodation_has_at_least_one_booking + and not accommodation_was_created_this_month + then '02-Never Booked' + -- 03-First Time Booked: The listing has been booked for the first + -- time and it + -- has been created this month + when accommodation_has_first_booking and has_been_booked_within_current_month + then '03-First Time Booked' + -- 04-Active: The listing has had at least 1 booking in its history + -- and it's + -- been less than 12 months since the last booking + -- and is not reactivated and is not FTB + when + accommodation_has_at_least_one_booking + and has_been_booked_within_last_12_months + -- not reactivated + and not ( + had_previous_booking_more_than_12_months_before_the_last + and has_been_booked_within_current_month + ) + -- not FTB + and not ( + accommodation_has_first_booking and has_been_booked_within_current_month + ) + then '04-Active' + -- 05-Churning: The listing has been booked at least once and it's + -- been 12 + -- months since the last booking + when accommodation_has_at_least_one_booking and last_booking_was_12_months_ago + then '05-Churning' + -- 06-Inactive: The listing has been booked at least once and it's + -- been more + -- than 12 months since the last booking + when + accommodation_has_at_least_one_booking + and not has_been_booked_within_last_12_months + and not last_booking_was_12_months_ago + then '06-Inactive' + -- 07-Reactivated: The listing was churned/inactive but has had a new + -- booking + -- this month + when + had_previous_booking_more_than_12_months_before_the_last + and has_been_booked_within_current_month + then '07-Reactivated' + else null + end as accommodation_lifecycle_state, + has_been_booked_within_current_month, + has_been_booked_within_last_6_months, + has_been_booked_within_last_12_months +from accommodation_historic_booking_features diff --git a/models/intermediate/kpis/int_kpis__metric_daily_created_bookings.sql b/models/intermediate/kpis/int_kpis__metric_daily_created_bookings.sql new file mode 100644 index 0000000..a14256e --- /dev/null +++ b/models/intermediate/kpis/int_kpis__metric_daily_created_bookings.sql @@ -0,0 +1,29 @@ +{{ config(materialized="table", unique_key=["date", "id_deal", "dash_source"]) }} +select + -- Unique Key -- + icb.created_date_utc as date, + coalesce(icuh.id_deal, 'UNSET') as id_deal, + case + when icbtpb.id_booking is not null then 'New Dash' else 'Old Dash' + end as dash_source, + -- Dimensions -- + coalesce( + icd.main_billing_country_iso_3_per_deal, 'UNSET' + ) as main_billing_country_iso_3_per_deal, + coalesce( + icmas.active_accommodations_per_deal_segmentation, 'UNSET' + ) as active_accommodations_per_deal_segmentation, + -- Metrics -- + count(distinct icb.id_booking) as created_bookings +from {{ ref("int_core__bookings") }} as icb +left join + {{ ref("int_core__user_host") }} as icuh on icb.id_user_host = icuh.id_user_host +left join {{ ref("int_core__deal") }} as icd on icuh.id_deal = icd.id_deal +left join + {{ ref("int_kpis__dimension_daily_accommodation") }} as icmas + on icuh.id_deal = icmas.id_deal + and icb.created_date_utc = icmas.date +left join + {{ ref("int_core__booking_to_product_bundle") }} as icbtpb + on icb.id_booking = icbtpb.id_booking +group by 1, 2, 3, 4, 5 diff --git a/models/intermediate/kpis/int_kpis__metric_monthly_created_bookings.sql b/models/intermediate/kpis/int_kpis__metric_monthly_created_bookings.sql new file mode 100644 index 0000000..faa8c15 --- /dev/null +++ b/models/intermediate/kpis/int_kpis__metric_monthly_created_bookings.sql @@ -0,0 +1,29 @@ +{{ + config( + materialized="view", + unique_key=[ + "end_date", + "id_deal", + "dash_source", + "active_accommodations_per_deal_segmentation", + ], + ) +}} + +select + -- Unique Key -- + d.first_day_month as start_date, + d.date as end_date, + b.id_deal, + b.dash_source, + b.active_accommodations_per_deal_segmentation, + -- Dimensions -- + b.main_billing_country_iso_3_per_deal, + -- Metrics -- + sum(b.created_bookings) as created_bookings +from {{ ref("int_kpis__dimension_dates") }} d +left join + {{ ref("int_kpis__metric_daily_created_bookings") }} b + on date_trunc('month', b.date)::date = d.first_day_month +where d.is_end_of_month = true and b.id_deal is not null +group by 1, 2, 3, 4, 5, 6 diff --git a/models/intermediate/kpis/int_kpis__metric_mtd_created_bookings.sql b/models/intermediate/kpis/int_kpis__metric_mtd_created_bookings.sql new file mode 100644 index 0000000..19f793e --- /dev/null +++ b/models/intermediate/kpis/int_kpis__metric_mtd_created_bookings.sql @@ -0,0 +1,30 @@ +{{ + config( + materialized="view", + unique_key=[ + "end_date", + "id_deal", + "dash_source", + "active_accommodations_per_deal_segmentation", + ], + ) +}} + +select + -- Unique Key -- + d.first_day_month as start_date, + d.date as end_date, + b.id_deal, + b.dash_source, + b.active_accommodations_per_deal_segmentation, + -- Dimensions -- + b.main_billing_country_iso_3_per_deal, + -- Metrics -- + sum(b.created_bookings) as created_bookings +from {{ ref("int_kpis__dimension_dates") }} d +left join + {{ ref("int_kpis__metric_daily_created_bookings") }} b + on date_trunc('month', b.date)::date = d.first_day_month + and extract(day from b.date) <= d.day +where d.is_month_to_date = true and b.id_deal is not null +group by 1, 2, 3, 4, 5, 6 diff --git a/models/intermediate/kpis/schema.yml b/models/intermediate/kpis/schema.yml new file mode 100644 index 0000000..eea8957 --- /dev/null +++ b/models/intermediate/kpis/schema.yml @@ -0,0 +1,554 @@ +version: 2 + +models: + - name: int_kpis__dimension_dates + description: | + This model provides the daily time dimensionality needed for KPIs. + It only considers dates up-to-yesterday. + + columns: + - name: date + data_type: date + description: Specific date. It's the primary key of this model. + tests: + - unique + - not_null + + - name: year + data_type: int + description: Year number of the given date. + tests: + - not_null + + - name: month + data_type: int + description: Month number of the given date. + tests: + - not_null + + - name: day + data_type: int + description: Day monthly number of the given date. + tests: + - not_null + + - name: first_day_month + data_type: date + description: | + First day of the month correspoding to the date field. + tests: + - not_null + + - name: last_day_month + data_type: date + description: | + Last day of the month correspoding to the date field. + tests: + - not_null + + - name: is_end_of_month + data_type: boolean + description: True if it's end of month, false otherwise. + tests: + - not_null + + - name: is_current_month + data_type: boolean + description: | + True if the date is within the current month, false otherwise. + tests: + - not_null + + - name: is_month_to_date + data_type: boolean + description: | + True if the date is within the scope of month-to-date, false otherwise. + The scope of month-to-date takes into account both 1) a date being in + the current month or 2) a date corresponding to the same month of the + previous year, which day number cannot be higher than yesterday's day + number. + tests: + - not_null + + - name: int_kpis__lifecycle_daily_accommodation + description: | + This model computes the daily lifecycle segment for each accommodation, also known as + listings. + The information regarding the booking-related time allows for the current status of any listing + regarding its activity. This information is encapsulated in the following columns: + + accommodation_lifecycle_state: contains one of the following states + - 01-New: Listings that have been created in the current month, without bookings + - 02-Never Booked: Listings that have been created before the current month, without bookings. + - 03-First Time Booked: Listings that have been booked for the first time in the current month. + - 04-Active: Listings that have booking activity in the past 12 months (that are not FTB nor reactivated) + - 05-Churning: Listings that are becoming inactive because of lack of bookings in the past 12 months + - 06-Inactive: Listings that have not had a booking for more than 12 months. + - 07-Reactivated: Listings that have had a booking in the current month that were inactive or churning before. + - Finally, if none of the logic applies, which should not happen, null will be set and a dbt alert will raise. + + Since the states of Active, First Time Booked and Reactivated indicate certain booking activity and are + mutually exclusive, the model also provides information of the recency of the bookings by the following + booleans: + - has_been_booked_within_current_month: If a listing has had a booking created in the current month + - has_been_booked_within_last_6_months: If a listing has had a booking created in the past 6 months + - has_been_booked_within_last_12_months: If a listing has had a booking created in the past 12 months + Note that if a listing has had a booking created in a given month, all 3 columns will be true. Similarly, + if the last booking created to a listing was 5 months ago, only the column has_been_booked_in_1_month + will be false; while the other 2 will be true. + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - date + - id_accommodation + + columns: + - name: date + data_type: date + description: Date in which a Listing has a given lifecycle state. + tests: + - not_null + + - name: id_accommodation + data_type: bigint + description: Id of the accommodation or listing. + tests: + - not_null + + - name: creation_date_utc + data_type: date + description: Date of when the listing was created. + + - name: first_time_booked_date_utc + data_type: date + description: | + Date of the first booking created for a given listing. Can be null if the listing + has never had a booking associated with it. + + - name: last_time_booked_date_utc + data_type: date + description: | + Date of the last booking created for a given listing. Can be null if the listing + has never had a booking associated with it. Can be the same as first_time_booked_date_utc + if the listing only had 1 booking in its history. + + - name: second_to_last_time_booked_date_utc + data_type: date + description: | + Date of the second-to-last booking created for a given listing, meaning the creation + date of the booking that precedes the last one. It's relevant for the reactivation computation + on the lifecycle. Can be null if the listing has never had a booking associated with it or if + the listing only had 1 booking in its history. + + - name: accommodation_lifecycle_state + data_type: character varying + description: | + Contains the lifecycle state of a Listing. The accepted values are: + 01-New, 02-Never Booked, 03-First Time Booked, 04-Active, 05-Churning, 06-Inactive, + 07-Reactivated. Failing to implement the logic will result in alert. + tests: + - not_null + - accepted_values: + values: + - 01-New + - 02-Never Booked + - 03-First Time Booked + - 04-Active + - 05-Churning + - 06-Inactive + - 07-Reactivated + + - name: has_been_booked_within_current_month + data_type: boolean + description: If the listing has had a booking created in the current month. + + - name: has_been_booked_within_last_6_months + data_type: boolean + description: If the listing has had a booking created in the past 6 months. + + - name: has_been_booked_within_last_12_months + data_type: boolean + description: If the listing has had a booking created in the past 12 months. + + - name: int_kpis__dimension_daily_accommodation + description: | + This model computes the deal segmentation per number of + listings in a daily manner. + + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - date + - id_deal + + columns: + - name: date + data_type: date + description: Specific date in which the segmentation applies. + tests: + - not_null + + - name: id_deal + data_type: string + description: Unique identifier of an account. + tests: + - not_null + + - name: active_accommodations_per_deal_segmentation + data_type: string + description: | + Segment value based on the number of listings booked in 12 months + for a given deal and date. + tests: + - accepted_values: + values: + - "0" + - "01-05" + - "06-20" + - "21-60" + - "61+" + - name: accommodations_booked_in_12_months + data_type: bigint + description: + Actual volume of listings that have been booked in the past 12 months + for a given deal and date. + + - name: int_kpis__metric_daily_created_bookings + description: | + This model computes the Daily Created Bookings at the deepest granularity. + + The unique key corresponds to the deepest granularity of the model, + in this case: + - date, + - id_deal, + - dash_source. + + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - date + - id_deal + - dash_source + + columns: + - name: date + data_type: date + description: Date of when Bookings have been created. + tests: + - not_null + + - name: id_deal + data_type: string + description: Unique identifier of an account. + tests: + - not_null + + - name: dash_source + data_type: string + description: Dashboard source, either old or new. + tests: + - not_null + - accepted_values: + values: + - "New Dash" + - "Old Dash" + + - name: active_accommodations_per_deal_segmentation + data_type: string + description: | + Segment value based on the number of listings booked in 12 months + for a given deal and date. + tests: + - not_null + - accepted_values: + values: + - "0" + - "01-05" + - "06-20" + - "21-60" + - "61+" + - "UNSET" + + - name: main_billing_country_iso_3_per_deal + data_type: string + description: | + Main billing country of the host aggregated at Deal level. + tests: + - not_null + + - name: created_bookings + data_type: bigint + description: | + Count of daily bookings created in a given date and per specified dimension. + + - name: int_kpis__metric_monthly_created_bookings + description: | + This model computes the Monthly Created Bookings at the + deepest granularity. + Be aware that any dimension that can change over the monthly period, + such as daily segmentations, are included in the primary key of the + model. + + The unique key corresponds to: + - end_date, + - id_deal, + - dash_source, + - active_accommodations_per_deal_segmentation. + + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - end_date + - id_deal + - dash_source + - active_accommodations_per_deal_segmentation + + columns: + - name: start_date + data_type: date + description: | + The start date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: end_date + data_type: date + description: | + The end date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: id_deal + data_type: string + description: Unique identifier of an account. + tests: + - not_null + + - name: dash_source + data_type: string + description: Dashboard source, either old or new. + tests: + - not_null + - accepted_values: + values: + - "New Dash" + - "Old Dash" + + - name: active_accommodations_per_deal_segmentation + data_type: string + description: | + Segment value based on the number of listings booked in 12 months + for a given deal and date. + tests: + - not_null + - accepted_values: + values: + - "0" + - "01-05" + - "06-20" + - "21-60" + - "61+" + - "UNSET" + + - name: main_billing_country_iso_3_per_deal + data_type: string + description: | + Main billing country of the host aggregated at Deal level. + tests: + - not_null + + - name: created_bookings + data_type: bigint + description: | + Count of accummulated bookings created in a given month up to the + given date and per specified dimension. + + - name: int_kpis__metric_mtd_created_bookings + description: | + This model computes the Month-To-Date Created Bookings at the + deepest granularity. + Be aware that any dimension that can change over the monthly period, + such as daily segmentations, are included in the primary key of the + model. + + The unique key corresponds to: + - end_date, + - id_deal, + - dash_source, + - active_accommodations_per_deal_segmentation. + + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - end_date + - id_deal + - dash_source + - active_accommodations_per_deal_segmentation + + columns: + - name: start_date + data_type: date + description: | + The start date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: end_date + data_type: date + description: | + The end date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: id_deal + data_type: string + description: Unique identifier of an account. + tests: + - not_null + + - name: dash_source + data_type: string + description: Dashboard source, either old or new. + tests: + - not_null + - accepted_values: + values: + - "New Dash" + - "Old Dash" + + - name: active_accommodations_per_deal_segmentation + data_type: string + description: | + Segment value based on the number of listings booked in 12 months + for a given deal and date. + tests: + - not_null + - accepted_values: + values: + - "0" + - "01-05" + - "06-20" + - "21-60" + - "61+" + - "UNSET" + + - name: main_billing_country_iso_3_per_deal + data_type: string + description: | + Main billing country of the host aggregated at Deal level. + tests: + - not_null + + - name: created_bookings + data_type: bigint + description: | + Count of accummulated bookings created in a given month up to the + given date and per specified dimension. + + - name: int_kpis__aggregated_monthly_created_bookings + description: | + This model computes the dimension aggregation for + Monthly Created Bookings. + + The primary key of this model is end_date, dimension + and dimension_value. + + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - end_date + - dimension + - dimension_value + + columns: + - name: start_date + data_type: date + description: | + The start date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: end_date + data_type: date + description: | + The end date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: dimension + data_type: string + description: The dimension or granularity of the metrics. + tests: + - assert_dimension_completeness: + metric_column_name: created_bookings + - accepted_values: + values: + - global + - by_number_of_listings + - by_billing_country + - by_dash_source + - by_deal + + - name: dimension_value + data_type: string + description: The value or segment available for the selected dimension. + tests: + - not_null + + - name: created_bookings + data_type: bigint + description: The month-to-date created bookings for a given date, dimension and value. + + - name: int_kpis__aggregated_mtd_created_bookings + description: | + This model computes the dimension aggregation for + Month-To-Date Created Bookings. + + The primary key of this model is end_date, dimension + and dimension_value. + + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - end_date + - dimension + - dimension_value + + columns: + - name: start_date + data_type: date + description: | + The start date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: end_date + data_type: date + description: | + The end date of the time range considered for the metrics in this record. + tests: + - not_null + + - name: dimension + data_type: string + description: The dimension or granularity of the metrics. + tests: + - assert_dimension_completeness: + metric_column_name: created_bookings + - accepted_values: + values: + - global + - by_number_of_listings + - by_billing_country + - by_dash_source + - by_deal + + - name: dimension_value + data_type: string + description: The value or segment available for the selected dimension. + tests: + - not_null + + - name: created_bookings + data_type: bigint + description: The month-to-date created bookings for a given date, dimension and value. diff --git a/tests/tmp_kpis_refactor_equivalent_created_bookings.sql b/tests/tmp_kpis_refactor_equivalent_created_bookings.sql new file mode 100644 index 0000000..a08c476 --- /dev/null +++ b/tests/tmp_kpis_refactor_equivalent_created_bookings.sql @@ -0,0 +1,52 @@ +{% set min_date = "2022-01-01" %} +{% set dimensions = ("global", "by_billing_country") %} +-- "by_number_of_listings" excluded on purpose - there's differences because of daily +-- segmentation +with + new_mtd_created_bookings as ( + select end_date as date, dimension, dimension_value, created_bookings + from {{ ref("int_kpis__aggregated_mtd_created_bookings") }} + where + end_date >= '{{ min_date }}' + and dimension in {{ dimensions }} + and dimension_value <> 'UNSET' + ), + new_monthly_created_bookings as ( + select end_date as date, dimension, dimension_value, created_bookings + from {{ ref("int_kpis__aggregated_monthly_created_bookings") }} + where + end_date >= '{{ min_date }}' + and dimension in {{ dimensions }} + and dimension_value <> 'UNSET' + ), + new_created_bookings as ( + select * + from new_mtd_created_bookings + union all + select * + from new_monthly_created_bookings + ), + old_created_bookings as ( + select date, dimension, dimension_value, created_bookings + from {{ ref("int_core__mtd_created_bookings_metric") }} + where date >= '{{ min_date }}' and dimension in {{ dimensions }} + ), + comparison as ( + select + coalesce(o.date, n.date) as date, + coalesce(o.dimension, n.dimension) as dimension, + coalesce(o.dimension_value, n.dimension_value) as dimension_value, + o.created_bookings as old_created_bookings, + n.created_bookings as new_created_bookings, + coalesce(o.created_bookings, 0) - coalesce(n.created_bookings, 0) as diff + from old_created_bookings o + full outer join + new_created_bookings n + on o.date = n.date + and o.dimension = n.dimension + and o.dimension_value = n.dimension_value + ) +select * +from comparison +where diff <> 0 +order by date desc, abs(diff) desc