Merged PR 3329: First version of KPIs refactored - created bookings

# Description

Creates skeleton for new KPIs data flow for created_bookings metric. Details are accessible [here](https://www.notion.so/knowyourguest-superhog/KPIs-Refactor-Let-s-go-daily-2024-10-23-1280446ff9c980dc87a3dc7453e95f06?pvs=4#12a0446ff9c98085bf4dfc77f6fc22f7)

In essence:
* Models are created in intermediate in a kpis folder.
* Models have a daily segmentation. This includes `created_bookings` models, but also the daily lifecycle per listing and the segmentation. It also adds a `dimension_dates` model specific for KPIs. These have all the dimensions already in place and handle all the crazy logic.
* Other time aggregation models simply read from existing daily models which are much easier (`int_kpis__metric_mtd_created_bookings` and `int_kpis__metric_monthly_created_bookings`).
* Dimensionality aggregation can be easily added within a given timeframe (daily, mtd, monthly). For instance, I do it for mtd in the `int_kpis__aggregated_mtd_created_bookings` and for monthly in `int_kpis__aggregated_monthly_created_bookings`
* Macro configuration for dimensions: Allows to set any specific dimension for `aggregated` models. By default, the subset of global, by billing country, by number of listings and by deal apply - since these are needed for Main KPIs. I added an example with Dash Source, that currently does not exist and it's currently configured as only appearing for created bookings.
* Testing `aggregated` models completeness. A new macro called `assert_dimension_completeness` is available that ensures additive metrics are consistent vs. the global result, configurable at schema level.
* Testing refactor impact. I'm aware that changing the lifecycle model to daily impacts the volumes for listing segments. For the rest, I added a `tmp` test that checks that the dimension and dimension value per date exactly match comparing new vs. old computation.

Latest edits:
* Changed naming convention
* Split of MTD and Monthly. Now these are 2 different entities, as stated in `int_kpis__dimension_dates`.
* Added start_date and end_date for models that contemplate a range (mtd, monthly).
* Added a small readme entry in the kpis folders. Mostly it states nomenclature and some first conventions.

Dbt docs:
![image (5).png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/3329/attachments/image%20%285%29.png)

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs. **Likely we'll be able to add macros for mtd and dim_agg models. We will see later on.**
- [ ] I've picked the right materialization for the affected models. **Models run ok except for the daily lifecycle of listings, which lasts several minutes in the first run. Model curr...
This commit is contained in:
Oriol Roqué Paniagua 2024-10-30 08:55:19 +00:00
parent 450975301a
commit 875f91be26
13 changed files with 1149 additions and 7 deletions

View file

@ -5,12 +5,17 @@ Provides a general configuration for the Dimensions available for the KPIs.
Please note that strings should be encoded with " ' your_value_here ' ",
while fields from tables should be specified like " your_field_here "
*/
{% macro get_kpi_dimensions() %}
{% set dimensions = [
{"dimension": "'global'", "dimension_value": "'global'"},
{"dimension": "'by_number_of_listings'", "dimension_value": "active_accommodations_per_deal_segmentation"},
{"dimension": "'by_billing_country'", "dimension_value": "main_billing_country_iso_3_per_deal"}
{
"dimension": "'by_number_of_listings'",
"dimension_value": "active_accommodations_per_deal_segmentation",
},
{
"dimension": "'by_billing_country'",
"dimension_value": "main_billing_country_iso_3_per_deal",
},
] %}
{{ return(dimensions) }}
{% endmacro %}
@ -25,12 +30,84 @@ This configuration ensures that working with new dimensions won't affect the dis
Additionally, it provides a proper display name for reporting purposes.
*/
{% macro get_kpi_dimensions_for_production() %}
{% set dimensions = [
{"dimension": "'global'", "dimension_display": "'Global'"},
{"dimension": "'by_number_of_listings'", "dimension_display": "'By # of Listings Booked in 12 Months'"},
{"dimension": "'by_billing_country'", "dimension_display": "'By Billing Country'"}
{
"dimension": "'by_number_of_listings'",
"dimension_display": "'By # of Listings Booked in 12 Months'",
},
{
"dimension": "'by_billing_country'",
"dimension_display": "'By Billing Country'",
},
] %}
{{ return(dimensions) }}
{% endmacro %}
{% endmacro %}
/*
The following lines specify for each dimension the field to be used in a
standalone macro.
Please note that strings should be encoded with " ' your_value_here ' ",
while fields from tables should be specified like " your_field_here "
*/
{% macro dim_global() %}
{{ return({"dimension": "'global'", "dimension_value": "'global'"}) }}
{% endmacro %}
{% macro dim_billing_country() %}
{{
return(
{
"dimension": "'by_billing_country'",
"dimension_value": "main_billing_country_iso_3_per_deal",
}
)
}}
{% endmacro %}
{% macro dim_number_of_listings() %}
{{
return(
{
"dimension": "'by_number_of_listings'",
"dimension_value": "active_accommodations_per_deal_segmentation",
}
)
}}
{% endmacro %}
{% macro dim_deal() %}
{{ return({"dimension": "'by_deal'", "dimension_value": "id_deal"}) }}
{% endmacro %}
{% macro dim_dash() %}
{{ return({"dimension": "'by_dash_source'", "dimension_value": "dash_source"}) }}
{% endmacro %}
/*
Macro: get_kpi_dimensions_per_model
Provides a general assignemnt for the Dimensions available for each KPI
model. Keep in mind that these assignations need to be previously
declared.
*/
{% macro get_kpi_dimensions_per_model(entity_name) %}
{# Base dimensions shared by all models #}
{% set base_dimensions = [
dim_global(),
dim_number_of_listings(),
dim_billing_country(),
dim_deal(),
] %}
{# Initialize a list to hold any model-specific dimensions #}
{% set additional_dimensions = [] %}
{# Add entity-specific dimensions #}
{% if entity_name == "CREATED_BOOKINGS" %}
{% set additional_dimensions = [dim_dash()] %}
{% endif %}
{# Combine base dimensions with additional dimensions for the specific model #}
{% set dimensions = base_dimensions + additional_dimensions %}
{{ return(dimensions) }}
{% endmacro %}

View file

@ -0,0 +1,21 @@
{% test assert_dimension_completeness(
model, column_name, metric_column_name, global_dimension_name="global"
) %}
with
sum_by_dimension as (
select {{ column_name }}, sum({{ metric_column_name }}) as sum_metric
from {{ model }}
group by {{ column_name }}
),
global_sum as (
select sum({{ metric_column_name }}) as total_metric
from {{ model }}
where {{ column_name }} = '{{ global_dimension_name }}'
)
select *
from sum_by_dimension, global_sum
where sum_by_dimension.sum_metric != global_sum.total_metric
{% endtest %}

View file

@ -0,0 +1,29 @@
# KPIs Readme
## Purpose
The `\kpis` folder is dedicated to KPIs modelisation, which include mostly any relevant dimension and measure and time aggregation needed for transforming data into business metrics.
## Convention
### Model names
- Any model within the folder `intermediate\kpis` needs to follow this convention: `int_kpis__{structure_type}_{time_dimension}_{relevant_entity_name}`.
- Structure types can be the following:
- `lifecycle`: any modelisation that classifies certain behavior on a given entity that can vary over time. For instance, the listing lifecycle in terms of booking creation could categorise the lifecycle of the listing based on whether a listing being new, active, never booked, inactive, etc.
- `dimension`: any modelisation that allows to segment or categorise data, so it can provide descriptive context for the measures. Segments resulting from lifecycles would likely have an equivalent dimension model.
- `metric`: any model that computes a given metric per different dimensions that is not aggregated. This means that each dimension will have a dedicated column within the model.
- `aggregated`: a model that aggregates the data into a 1) date range, 2) a dimension and 3) a dimension value for any given metric. These will always depend on metrics models.
- Time dimension can be the following:
- `daily`: if the time granularity is daily
- `monthly`: if the time granularity is monthly, meaning metrics are aggregated to the month
- `mtd`: if the time granularity is month-to-date, meaning metrics are cumulative to a certain date of the current month and so it's the case for the same days on the month of the previous days.
- others.
- Relevant entitity needs to easily and uniquely identify the entity being modelled, such as Created Bookings.
- The only exception is `int_kpis__dimension_dates`, that even though is granular at daily level, it's simplified on purpose.
### Logic
- The model that contains the deepest granularity for each entity should be the one handling the data gathering to compute raw metrics and dimensions. Likely, this model will be in the form of `int_kpis__metric_daily_{relevant_entity_name}`. In this case, joins outside of the `kpis` folder are accepted and expected.
- Downstream models, indistinctly of these being `metric` or `aggregated` models, should not join with other models outside of the `kpis` folder.
- Downstream models could eventually join with other models within the `kpis` folder in order to create weighted or converted metrics.

View file

@ -0,0 +1,24 @@
{% set dimensions = get_kpi_dimensions_per_model("CREATED_BOOKINGS") %}
{{
config(
materialized="table", unique_key=["end_date", "dimension", "dimension_value"]
)
}}
{% for dimension in dimensions %}
select
-- Unique Key --
start_date,
end_date,
{{ dimension.dimension }} as dimension,
{{ dimension.dimension_value }} as dimension_value,
-- Metrics --
sum(created_bookings) as created_bookings
from {{ ref("int_kpis__metric_monthly_created_bookings") }}
group by 1, 2, 3, 4
{% if not loop.last %}
union all
{% endif %}
{% endfor %}

View file

@ -0,0 +1,24 @@
{% set dimensions = get_kpi_dimensions_per_model("CREATED_BOOKINGS") %}
{{
config(
materialized="table", unique_key=["end_date", "dimension", "dimension_value"]
)
}}
{% for dimension in dimensions %}
select
-- Unique Key --
start_date,
end_date,
{{ dimension.dimension }} as dimension,
{{ dimension.dimension_value }} as dimension_value,
-- Metrics --
sum(created_bookings) as created_bookings
from {{ ref("int_kpis__metric_mtd_created_bookings") }}
group by 1, 2, 3, 4
{% if not loop.last %}
union all
{% endif %}
{% endfor %}

View file

@ -0,0 +1,42 @@
{{ config(materialized="table", unique_key=["date", "id_deal"]) }}
with
int_kpis__lifecycle_daily_accommodation as (
select * from {{ ref("int_kpis__lifecycle_daily_accommodation") }}
),
int_core__unique_accommodation_to_user as (
select * from {{ ref("int_core__unique_accommodation_to_user") }}
),
int_core__user_host as (select * from {{ ref("int_core__user_host") }}),
active_accommodations_per_deal as (
select
al.date,
uu.id_deal,
sum(
case when al.has_been_booked_within_last_12_months then 1 else 0 end
) as accommodations_booked_in_12_months
from int_kpis__lifecycle_daily_accommodation al
inner join
int_core__unique_accommodation_to_user atu
on al.id_accommodation = atu.id_accommodation
inner join int_core__user_host uu on uu.id_user_host = atu.id_user_owner
where uu.id_deal is not null
group by 1, 2
)
select
date,
id_deal,
case
when accommodations_booked_in_12_months = 0
then '0'
when accommodations_booked_in_12_months between 1 and 5
then '01-05'
when accommodations_booked_in_12_months between 6 and 20
then '06-20'
when accommodations_booked_in_12_months between 21 and 60
then '21-60'
when accommodations_booked_in_12_months >= 61
then '61+'
end as active_accommodations_per_deal_segmentation,
accommodations_booked_in_12_months
from active_accommodations_per_deal

View file

@ -0,0 +1,44 @@
{{ config(materialized="table", unique_key="date") }}
with
int_dates as (
select * from {{ ref("int_dates") }} where date_day >= {{ var("start_date") }}
),
raw_dates as (
select
id.year_number as year,
id.month_of_year as month,
id.day_of_month as day,
id.date_day as date,
id.month_start_date as first_day_month,
id.month_end_date as last_day_month,
now()::date as today
from int_dates id
)
select distinct
rd.year,
rd.month,
rd.day,
rd.date,
rd.first_day_month,
rd.last_day_month,
case when rd.date = rd.last_day_month then true else false end as is_end_of_month,
case
when date_trunc('month', rd.date) = date_trunc('month', rd.today)
then true
else false
end as is_current_month,
case
when date_trunc('month', rd.date) = date_trunc('month', rd.today)
then true
when
rd.year = extract(year from rd.today) - 1
and rd.month = extract(month from rd.today)
and rd.day < extract(day from rd.today)
then true
else false
end as is_month_to_date
from raw_dates rd
where
-- include only up-to yesterday
rd.today > rd.date

View file

@ -0,0 +1,187 @@
{{
config(
materialized="incremental",
unique_key=["date", "id_accommodation"],
incremental_strategy="merge",
)
}}
with
int_core__bookings as (select * from {{ ref("int_core__bookings") }}),
stg_core__accommodation as (select * from {{ ref("stg_core__accommodation") }}),
int_core__unique_accommodation_to_user as (
select * from {{ ref("int_core__unique_accommodation_to_user") }}
),
int_kpis__dimension_dates as (select * from {{ ref("int_kpis__dimension_dates") }}),
booked_days_per_accommodation as (
select
id_accommodation,
created_date_utc,
lag(created_date_utc, 1) over (
partition by id_accommodation order by created_date_utc asc
) as previous_booked_date
from int_core__bookings
group by id_accommodation, created_date_utc
),
accommodation_historic_booking_dates as (
select
d.date,
a.id_accommodation,
date(a.created_at_utc) as creation_date_utc,
min(b.created_date_utc) as first_time_booked_date_utc,
max(b.created_date_utc) as last_time_booked_date_utc,
max(b.previous_booked_date) as second_to_last_time_booked_date_utc
from int_kpis__dimension_dates d
inner join stg_core__accommodation a on d.date >= date(a.created_at_utc)
-- Inclusion of unique accommodation to user to force that the
-- accommodation needs to have a host assigned into it.
inner join
int_core__unique_accommodation_to_user uatu
on uatu.id_accommodation = a.id_accommodation
left join
booked_days_per_accommodation b
on uatu.id_accommodation = b.id_accommodation
and d.date >= b.created_date_utc
{% if is_incremental() %}
where d.date >= (select current_date - interval '7 days')
{% endif %}
group by d.date, a.id_accommodation, date(a.created_at_utc)
),
accommodation_historic_booking_features as (
select
ahbd.date,
ahbd.id_accommodation,
ahbd.creation_date_utc,
ahbd.first_time_booked_date_utc,
ahbd.last_time_booked_date_utc,
ahbd.second_to_last_time_booked_date_utc,
case
when ahbd.first_time_booked_date_utc is not null then true else false
end as accommodation_has_at_least_one_booking,
case
when
date_trunc('month', ahbd.creation_date_utc)
= date_trunc('month', ahbd.date)
then true
else false
end as accommodation_was_created_this_month,
case
when
date_trunc('month', ahbd.first_time_booked_date_utc)
= date_trunc('month', ahbd.last_time_booked_date_utc)
then true
else false
end as accommodation_has_first_booking,
case
when
date_trunc('month', ahbd.last_time_booked_date_utc)
= date_trunc('month', ahbd.date)
then true
else false
end as has_been_booked_within_current_month,
case
when
date_trunc('month', ahbd.last_time_booked_date_utc)
+ interval '6 months'
> date_trunc('month', ahbd.date)
then true
else false
end as has_been_booked_within_last_6_months,
case
when
date_trunc('month', ahbd.last_time_booked_date_utc)
+ interval '12 months'
> date_trunc('month', ahbd.date)
then true
else false
end as has_been_booked_within_last_12_months,
case
when
date_trunc('month', ahbd.last_time_booked_date_utc)
+ interval '12 months'
= date_trunc('month', ahbd.date)
then true
else false
end as last_booking_was_12_months_ago,
case
when
date_trunc('month', ahbd.second_to_last_time_booked_date_utc)
+ interval '12 months'
< date_trunc('month', ahbd.last_time_booked_date_utc)
then true
else false
end as had_previous_booking_more_than_12_months_before_the_last
from accommodation_historic_booking_dates ahbd
)
select
date,
id_accommodation,
creation_date_utc,
first_time_booked_date_utc,
last_time_booked_date_utc,
second_to_last_time_booked_date_utc,
case
-- 01-New: The listing has been created this month and has not had any
-- booking
when
not accommodation_has_at_least_one_booking
and accommodation_was_created_this_month
then '01-New'
-- 02-Never Booked: The listing has been created before this month and
-- has not
-- had any booking
when
not accommodation_has_at_least_one_booking
and not accommodation_was_created_this_month
then '02-Never Booked'
-- 03-First Time Booked: The listing has been booked for the first
-- time and it
-- has been created this month
when accommodation_has_first_booking and has_been_booked_within_current_month
then '03-First Time Booked'
-- 04-Active: The listing has had at least 1 booking in its history
-- and it's
-- been less than 12 months since the last booking
-- and is not reactivated and is not FTB
when
accommodation_has_at_least_one_booking
and has_been_booked_within_last_12_months
-- not reactivated
and not (
had_previous_booking_more_than_12_months_before_the_last
and has_been_booked_within_current_month
)
-- not FTB
and not (
accommodation_has_first_booking and has_been_booked_within_current_month
)
then '04-Active'
-- 05-Churning: The listing has been booked at least once and it's
-- been 12
-- months since the last booking
when accommodation_has_at_least_one_booking and last_booking_was_12_months_ago
then '05-Churning'
-- 06-Inactive: The listing has been booked at least once and it's
-- been more
-- than 12 months since the last booking
when
accommodation_has_at_least_one_booking
and not has_been_booked_within_last_12_months
and not last_booking_was_12_months_ago
then '06-Inactive'
-- 07-Reactivated: The listing was churned/inactive but has had a new
-- booking
-- this month
when
had_previous_booking_more_than_12_months_before_the_last
and has_been_booked_within_current_month
then '07-Reactivated'
else null
end as accommodation_lifecycle_state,
has_been_booked_within_current_month,
has_been_booked_within_last_6_months,
has_been_booked_within_last_12_months
from accommodation_historic_booking_features

View file

@ -0,0 +1,29 @@
{{ config(materialized="table", unique_key=["date", "id_deal", "dash_source"]) }}
select
-- Unique Key --
icb.created_date_utc as date,
coalesce(icuh.id_deal, 'UNSET') as id_deal,
case
when icbtpb.id_booking is not null then 'New Dash' else 'Old Dash'
end as dash_source,
-- Dimensions --
coalesce(
icd.main_billing_country_iso_3_per_deal, 'UNSET'
) as main_billing_country_iso_3_per_deal,
coalesce(
icmas.active_accommodations_per_deal_segmentation, 'UNSET'
) as active_accommodations_per_deal_segmentation,
-- Metrics --
count(distinct icb.id_booking) as created_bookings
from {{ ref("int_core__bookings") }} as icb
left join
{{ ref("int_core__user_host") }} as icuh on icb.id_user_host = icuh.id_user_host
left join {{ ref("int_core__deal") }} as icd on icuh.id_deal = icd.id_deal
left join
{{ ref("int_kpis__dimension_daily_accommodation") }} as icmas
on icuh.id_deal = icmas.id_deal
and icb.created_date_utc = icmas.date
left join
{{ ref("int_core__booking_to_product_bundle") }} as icbtpb
on icb.id_booking = icbtpb.id_booking
group by 1, 2, 3, 4, 5

View file

@ -0,0 +1,29 @@
{{
config(
materialized="view",
unique_key=[
"end_date",
"id_deal",
"dash_source",
"active_accommodations_per_deal_segmentation",
],
)
}}
select
-- Unique Key --
d.first_day_month as start_date,
d.date as end_date,
b.id_deal,
b.dash_source,
b.active_accommodations_per_deal_segmentation,
-- Dimensions --
b.main_billing_country_iso_3_per_deal,
-- Metrics --
sum(b.created_bookings) as created_bookings
from {{ ref("int_kpis__dimension_dates") }} d
left join
{{ ref("int_kpis__metric_daily_created_bookings") }} b
on date_trunc('month', b.date)::date = d.first_day_month
where d.is_end_of_month = true and b.id_deal is not null
group by 1, 2, 3, 4, 5, 6

View file

@ -0,0 +1,30 @@
{{
config(
materialized="view",
unique_key=[
"end_date",
"id_deal",
"dash_source",
"active_accommodations_per_deal_segmentation",
],
)
}}
select
-- Unique Key --
d.first_day_month as start_date,
d.date as end_date,
b.id_deal,
b.dash_source,
b.active_accommodations_per_deal_segmentation,
-- Dimensions --
b.main_billing_country_iso_3_per_deal,
-- Metrics --
sum(b.created_bookings) as created_bookings
from {{ ref("int_kpis__dimension_dates") }} d
left join
{{ ref("int_kpis__metric_daily_created_bookings") }} b
on date_trunc('month', b.date)::date = d.first_day_month
and extract(day from b.date) <= d.day
where d.is_month_to_date = true and b.id_deal is not null
group by 1, 2, 3, 4, 5, 6

View file

@ -0,0 +1,554 @@
version: 2
models:
- name: int_kpis__dimension_dates
description: |
This model provides the daily time dimensionality needed for KPIs.
It only considers dates up-to-yesterday.
columns:
- name: date
data_type: date
description: Specific date. It's the primary key of this model.
tests:
- unique
- not_null
- name: year
data_type: int
description: Year number of the given date.
tests:
- not_null
- name: month
data_type: int
description: Month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: Day monthly number of the given date.
tests:
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month correspoding to the date field.
tests:
- not_null
- name: last_day_month
data_type: date
description: |
Last day of the month correspoding to the date field.
tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: True if it's end of month, false otherwise.
tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
True if the date is within the current month, false otherwise.
tests:
- not_null
- name: is_month_to_date
data_type: boolean
description: |
True if the date is within the scope of month-to-date, false otherwise.
The scope of month-to-date takes into account both 1) a date being in
the current month or 2) a date corresponding to the same month of the
previous year, which day number cannot be higher than yesterday's day
number.
tests:
- not_null
- name: int_kpis__lifecycle_daily_accommodation
description: |
This model computes the daily lifecycle segment for each accommodation, also known as
listings.
The information regarding the booking-related time allows for the current status of any listing
regarding its activity. This information is encapsulated in the following columns:
accommodation_lifecycle_state: contains one of the following states
- 01-New: Listings that have been created in the current month, without bookings
- 02-Never Booked: Listings that have been created before the current month, without bookings.
- 03-First Time Booked: Listings that have been booked for the first time in the current month.
- 04-Active: Listings that have booking activity in the past 12 months (that are not FTB nor reactivated)
- 05-Churning: Listings that are becoming inactive because of lack of bookings in the past 12 months
- 06-Inactive: Listings that have not had a booking for more than 12 months.
- 07-Reactivated: Listings that have had a booking in the current month that were inactive or churning before.
- Finally, if none of the logic applies, which should not happen, null will be set and a dbt alert will raise.
Since the states of Active, First Time Booked and Reactivated indicate certain booking activity and are
mutually exclusive, the model also provides information of the recency of the bookings by the following
booleans:
- has_been_booked_within_current_month: If a listing has had a booking created in the current month
- has_been_booked_within_last_6_months: If a listing has had a booking created in the past 6 months
- has_been_booked_within_last_12_months: If a listing has had a booking created in the past 12 months
Note that if a listing has had a booking created in a given month, all 3 columns will be true. Similarly,
if the last booking created to a listing was 5 months ago, only the column has_been_booked_in_1_month
will be false; while the other 2 will be true.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_accommodation
columns:
- name: date
data_type: date
description: Date in which a Listing has a given lifecycle state.
tests:
- not_null
- name: id_accommodation
data_type: bigint
description: Id of the accommodation or listing.
tests:
- not_null
- name: creation_date_utc
data_type: date
description: Date of when the listing was created.
- name: first_time_booked_date_utc
data_type: date
description: |
Date of the first booking created for a given listing. Can be null if the listing
has never had a booking associated with it.
- name: last_time_booked_date_utc
data_type: date
description: |
Date of the last booking created for a given listing. Can be null if the listing
has never had a booking associated with it. Can be the same as first_time_booked_date_utc
if the listing only had 1 booking in its history.
- name: second_to_last_time_booked_date_utc
data_type: date
description: |
Date of the second-to-last booking created for a given listing, meaning the creation
date of the booking that precedes the last one. It's relevant for the reactivation computation
on the lifecycle. Can be null if the listing has never had a booking associated with it or if
the listing only had 1 booking in its history.
- name: accommodation_lifecycle_state
data_type: character varying
description: |
Contains the lifecycle state of a Listing. The accepted values are:
01-New, 02-Never Booked, 03-First Time Booked, 04-Active, 05-Churning, 06-Inactive,
07-Reactivated. Failing to implement the logic will result in alert.
tests:
- not_null
- accepted_values:
values:
- 01-New
- 02-Never Booked
- 03-First Time Booked
- 04-Active
- 05-Churning
- 06-Inactive
- 07-Reactivated
- name: has_been_booked_within_current_month
data_type: boolean
description: If the listing has had a booking created in the current month.
- name: has_been_booked_within_last_6_months
data_type: boolean
description: If the listing has had a booking created in the past 6 months.
- name: has_been_booked_within_last_12_months
data_type: boolean
description: If the listing has had a booking created in the past 12 months.
- name: int_kpis__dimension_daily_accommodation
description: |
This model computes the deal segmentation per number of
listings in a daily manner.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
columns:
- name: date
data_type: date
description: Specific date in which the segmentation applies.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- name: accommodations_booked_in_12_months
data_type: bigint
description:
Actual volume of listings that have been booked in the past 12 months
for a given deal and date.
- name: int_kpis__metric_daily_created_bookings
description: |
This model computes the Daily Created Bookings at the deepest granularity.
The unique key corresponds to the deepest granularity of the model,
in this case:
- date,
- id_deal,
- dash_source.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
- dash_source
columns:
- name: date
data_type: date
description: Date of when Bookings have been created.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: dash_source
data_type: string
description: Dashboard source, either old or new.
tests:
- not_null
- accepted_values:
values:
- "New Dash"
- "Old Dash"
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- not_null
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- "UNSET"
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
Main billing country of the host aggregated at Deal level.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: |
Count of daily bookings created in a given date and per specified dimension.
- name: int_kpis__metric_monthly_created_bookings
description: |
This model computes the Monthly Created Bookings at the
deepest granularity.
Be aware that any dimension that can change over the monthly period,
such as daily segmentations, are included in the primary key of the
model.
The unique key corresponds to:
- end_date,
- id_deal,
- dash_source,
- active_accommodations_per_deal_segmentation.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- id_deal
- dash_source
- active_accommodations_per_deal_segmentation
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: dash_source
data_type: string
description: Dashboard source, either old or new.
tests:
- not_null
- accepted_values:
values:
- "New Dash"
- "Old Dash"
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- not_null
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- "UNSET"
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
Main billing country of the host aggregated at Deal level.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: |
Count of accummulated bookings created in a given month up to the
given date and per specified dimension.
- name: int_kpis__metric_mtd_created_bookings
description: |
This model computes the Month-To-Date Created Bookings at the
deepest granularity.
Be aware that any dimension that can change over the monthly period,
such as daily segmentations, are included in the primary key of the
model.
The unique key corresponds to:
- end_date,
- id_deal,
- dash_source,
- active_accommodations_per_deal_segmentation.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- id_deal
- dash_source
- active_accommodations_per_deal_segmentation
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: id_deal
data_type: string
description: Unique identifier of an account.
tests:
- not_null
- name: dash_source
data_type: string
description: Dashboard source, either old or new.
tests:
- not_null
- accepted_values:
values:
- "New Dash"
- "Old Dash"
- name: active_accommodations_per_deal_segmentation
data_type: string
description: |
Segment value based on the number of listings booked in 12 months
for a given deal and date.
tests:
- not_null
- accepted_values:
values:
- "0"
- "01-05"
- "06-20"
- "21-60"
- "61+"
- "UNSET"
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
Main billing country of the host aggregated at Deal level.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: |
Count of accummulated bookings created in a given month up to the
given date and per specified dimension.
- name: int_kpis__aggregated_monthly_created_bookings
description: |
This model computes the dimension aggregation for
Monthly Created Bookings.
The primary key of this model is end_date, dimension
and dimension_value.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- dimension
- dimension_value
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- assert_dimension_completeness:
metric_column_name: created_bookings
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- by_dash_source
- by_deal
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: The month-to-date created bookings for a given date, dimension and value.
- name: int_kpis__aggregated_mtd_created_bookings
description: |
This model computes the dimension aggregation for
Month-To-Date Created Bookings.
The primary key of this model is end_date, dimension
and dimension_value.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- end_date
- dimension
- dimension_value
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- assert_dimension_completeness:
metric_column_name: created_bookings
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- by_dash_source
- by_deal
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: created_bookings
data_type: bigint
description: The month-to-date created bookings for a given date, dimension and value.

View file

@ -0,0 +1,52 @@
{% set min_date = "2022-01-01" %}
{% set dimensions = ("global", "by_billing_country") %}
-- "by_number_of_listings" excluded on purpose - there's differences because of daily
-- segmentation
with
new_mtd_created_bookings as (
select end_date as date, dimension, dimension_value, created_bookings
from {{ ref("int_kpis__aggregated_mtd_created_bookings") }}
where
end_date >= '{{ min_date }}'
and dimension in {{ dimensions }}
and dimension_value <> 'UNSET'
),
new_monthly_created_bookings as (
select end_date as date, dimension, dimension_value, created_bookings
from {{ ref("int_kpis__aggregated_monthly_created_bookings") }}
where
end_date >= '{{ min_date }}'
and dimension in {{ dimensions }}
and dimension_value <> 'UNSET'
),
new_created_bookings as (
select *
from new_mtd_created_bookings
union all
select *
from new_monthly_created_bookings
),
old_created_bookings as (
select date, dimension, dimension_value, created_bookings
from {{ ref("int_core__mtd_created_bookings_metric") }}
where date >= '{{ min_date }}' and dimension in {{ dimensions }}
),
comparison as (
select
coalesce(o.date, n.date) as date,
coalesce(o.dimension, n.dimension) as dimension,
coalesce(o.dimension_value, n.dimension_value) as dimension_value,
o.created_bookings as old_created_bookings,
n.created_bookings as new_created_bookings,
coalesce(o.created_bookings, 0) - coalesce(n.created_bookings, 0) as diff
from old_created_bookings o
full outer join
new_created_bookings n
on o.date = n.date
and o.dimension = n.dimension
and o.dimension_value = n.dimension_value
)
select *
from comparison
where diff <> 0
order by date desc, abs(diff) desc