data-dwh-dbt-project/models/intermediate/kpis/schema.yml
Oriol Roqué Paniagua 875f91be26 Merged PR 3329: First version of KPIs refactored - created bookings
# Description

Creates skeleton for new KPIs data flow for created_bookings metric. Details are accessible [here](https://www.notion.so/knowyourguest-superhog/KPIs-Refactor-Let-s-go-daily-2024-10-23-1280446ff9c980dc87a3dc7453e95f06?pvs=4#12a0446ff9c98085bf4dfc77f6fc22f7)

In essence:
* Models are created in intermediate in a kpis folder.
* Models have a daily segmentation. This includes `created_bookings` models, but also the daily lifecycle per listing and the segmentation. It also adds a `dimension_dates` model specific for KPIs. These have all the dimensions already in place and handle all the crazy logic.
* Other time aggregation models simply read from existing daily models which are much easier (`int_kpis__metric_mtd_created_bookings` and `int_kpis__metric_monthly_created_bookings`).
* Dimensionality aggregation can be easily added within a given timeframe (daily, mtd, monthly). For instance, I do it for mtd in the `int_kpis__aggregated_mtd_created_bookings` and for monthly in `int_kpis__aggregated_monthly_created_bookings`
* Macro configuration for dimensions: Allows to set any specific dimension for `aggregated` models. By default, the subset of global, by billing country, by number of listings and by deal apply - since these are needed for Main KPIs. I added an example with Dash Source, that currently does not exist and it's currently configured as only appearing for created bookings.
* Testing `aggregated` models completeness. A new macro called `assert_dimension_completeness` is available that ensures additive metrics are consistent vs. the global result, configurable at schema level.
* Testing refactor impact. I'm aware that changing the lifecycle model to daily impacts the volumes for listing segments. For the rest, I added a `tmp` test that checks that the dimension and dimension value per date exactly match comparing new vs. old computation.

Latest edits:
* Changed naming convention
* Split of MTD and Monthly. Now these are 2 different entities, as stated in `int_kpis__dimension_dates`.
* Added start_date and end_date for models that contemplate a range (mtd, monthly).
* Added a small readme entry in the kpis folders. Mostly it states nomenclature and some first conventions.

Dbt docs:
![image (5).png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/3329/attachments/image%20%285%29.png)

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs. **Likely we'll be able to add macros for mtd and dim_agg models. We will see later on.**
- [ ] I've picked the right materialization for the affected models. **Models run ok except for the daily lifecycle of listings, which lasts several minutes in the first run. Model curr...
2024-10-30 08:55:19 +00:00

554 lines
17 KiB
YAML

version: 2
models:
- name: int_kpis__dimension_dates
description: |
This model provides the daily time dimensionality needed for KPIs.
It only considers dates up-to-yesterday.
columns:
- name: date
data_type: date
description: Specific date. It's the primary key of this model.
tests:
- unique
- not_null
- name: year
data_type: int
description: Year number of the given date.
tests:
- not_null
- name: month
data_type: int
description: Month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: Day monthly number of the given date.
tests:
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month correspoding to the date field.
tests:
- not_null
- name: last_day_month
data_type: date
description: |
Last day of the month correspoding to the date field.
tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: True if it's end of month, false otherwise.
tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
True if the date is within the current month, false otherwise.
tests:
- not_null
- name: is_month_to_date
data_type: boolean
description: |
True if the date is within the scope of month-to-date, false otherwise.
The scope of month-to-date takes into account both 1) a date being in
the current month or 2) a date corresponding to the same month of the
previous year, which day number cannot be higher than yesterday's day
number.
tests:
- not_null
- name: int_kpis__lifecycle_daily_accommodation
description: |
This model computes the daily lifecycle segment for each accommodation, also known as
listings.
The information regarding the booking-related time allows for the current status of any listing
regarding its activity. This information is encapsulated in the following columns:
accommodation_lifecycle_state: contains one of the following states
- 01-New: Listings that have been created in the current month, without bookings
- 02-Never Booked: Listings that have been created before the current month, without bookings.
- 03-First Time Booked: Listings that have been booked for the first time in the current month.
- 04-Active: Listings that have booking activity in the past 12 months (that are not FTB nor reactivated)
- 05-Churning: Listings that are becoming inactive because of lack of bookings in the past 12 months
- 06-Inactive: Listings that have not had a booking for more than 12 months.
- 07-Reactivated: Listings that have had a booking in the current month that were inactive or churning before.
- Finally, if none of the logic applies, which should not happen, null will be set and a dbt alert will raise.
Since the states of Active, First Time Booked and Reactivated indicate certain booking activity and are
mutually exclusive, the model also provides information of the recency of the bookings by the following
booleans:
- has_been_booked_within_current_month: If a listing has had a booking created in the current month
- has_been_booked_within_last_6_months: If a listing has had a booking created in the past 6 months
- has_been_booked_within_last_12_months: If a listing has had a booking created in the past 12 months
Note that if a listing has had a booking created in a given month, all 3 columns will be true. Similarly,
if the last booking created to a listing was 5 months ago, only the column has_been_booked_in_1_month
will be false; while the other 2 will be true.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_accommodation
columns:
- name: date
data_type: date
description: Date in which a Listing has a given lifecycle state.
tests:
- not_null
- name: id_accommodation
data_type: bigint
description: Id of the accommodation or listing.
tests:
- not_null
- name: creation_date_utc
data_type: date
description: Date of when the listing was created.
- name: first_time_booked_date_utc
data_type: date
description: |
Date of the first booking created for a given listing. Can be null if the listing
has never had a booking associated with it.
- name: last_time_booked_date_utc
data_type: date
description: |
Date of the last booking created for a given listing. Can be null if the listing
has never had a booking associated with it. Can be the same as first_time_booked_date_utc
if the listing only had 1 booking in its history.
- name: second_to_last_time_booked_date_utc
data_type: date
description: |
Date of the second-to-last booking created for a given listing, meaning the creation
date of the booking that precedes the last one. It's relevant for the reactivation computation
on the lifecycle. Can be null if the listing has never had a booking associated with it or if
the listing only had 1 booking in its history.
- name: accommodation_lifecycle_state
data_type: character varying
description: |
Contains the lifecycle state of a Listing. The accepted values are:
01-New, 02-Never Booked, 03-First Time Booked, 04-Active, 05-Churning, 06-Inactive,
07-Reactivated. Failing to implement the logic will result in alert.
tests:
- not_null
- accepted_values:
values:
- 01-New
- 02-Never Booked
- 03-First Time Booked
- 04-Active
- 05-Churning
- 06-Inactive
- 07-Reactivated
- name: has_been_booked_within_current_month
data_type: boolean
description: If the listing has had a booking created in the current month.
- name: has_been_booked_within_last_6_months
data_type: boolean
description: If the listing has had a booking created in the past 6 months.
- name: has_been_booked_within_last_12_months
data_type: boolean
description: If the listing has had a booking created in the past 12 months.
- name: int_kpis__dimension_daily_accommodation
description: |
This model computes the deal segmentation per number of
listings in a daily manner.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
columns:
- name: date
data_type: date
description: Specific date in which the segmentation applies.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- name: accommodations_booked_in_12_months
data_type: bigint
description:
Actual volume of listings that have been booked in the past 12 months
for a given deal and date.
- name: int_kpis__metric_daily_created_bookings
description: |
This model computes the Daily Created Bookings at the deepest granularity.
The unique key corresponds to the deepest granularity of the model,
in this case:
- date,
- id_deal,
- dash_source.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
- dash_source
columns:
- name: date
data_type: date
description: Date of when Bookings have been created.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: dash_source
data_type: string
description: Dashboard source, either old or new.
tests:
- not_null
- accepted_values:
values:
- "New Dash"
- "Old Dash"
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- not_null
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- "UNSET"
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
Main billing country of the host aggregated at Deal level.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: |
Count of daily bookings created in a given date and per specified dimension.
- name: int_kpis__metric_monthly_created_bookings
description: |
This model computes the Monthly Created Bookings at the
deepest granularity.
Be aware that any dimension that can change over the monthly period,
such as daily segmentations, are included in the primary key of the
model.
The unique key corresponds to:
- end_date,
- id_deal,
- dash_source,
- active_accommodations_per_deal_segmentation.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- id_deal
- dash_source
- active_accommodations_per_deal_segmentation
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: dash_source
data_type: string
description: Dashboard source, either old or new.
tests:
- not_null
- accepted_values:
values:
- "New Dash"
- "Old Dash"
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- not_null
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- "UNSET"
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
Main billing country of the host aggregated at Deal level.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: |
Count of accummulated bookings created in a given month up to the
given date and per specified dimension.
- name: int_kpis__metric_mtd_created_bookings
description: |
This model computes the Month-To-Date Created Bookings at the
deepest granularity.
Be aware that any dimension that can change over the monthly period,
such as daily segmentations, are included in the primary key of the
model.
The unique key corresponds to:
- end_date,
- id_deal,
- dash_source,
- active_accommodations_per_deal_segmentation.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- id_deal
- dash_source
- active_accommodations_per_deal_segmentation
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: dash_source
data_type: string
description: Dashboard source, either old or new.
tests:
- not_null
- accepted_values:
values:
- "New Dash"
- "Old Dash"
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- not_null
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- "UNSET"
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
Main billing country of the host aggregated at Deal level.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: |
Count of accummulated bookings created in a given month up to the
given date and per specified dimension.
- name: int_kpis__aggregated_monthly_created_bookings
description: |
This model computes the dimension aggregation for
Monthly Created Bookings.
The primary key of this model is end_date, dimension
and dimension_value.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- dimension
- dimension_value
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- assert_dimension_completeness:
metric_column_name: created_bookings
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- by_dash_source
- by_deal
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: The month-to-date created bookings for a given date, dimension and value.
- name: int_kpis__aggregated_mtd_created_bookings
description: |
This model computes the dimension aggregation for
Month-To-Date Created Bookings.
The primary key of this model is end_date, dimension
and dimension_value.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- dimension
- dimension_value
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- assert_dimension_completeness:
metric_column_name: created_bookings
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- by_dash_source
- by_deal
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: The month-to-date created bookings for a given date, dimension and value.