Merged PR 4665: Projected Bookings to the end of the current month

# Description

Computes projected Bookings to the end of the current month per dimension and dimension value

KPIs changes:
* Adds a int_kpis__agg_daily_created_bookings model. This is a standard KPIs model that computes created bookings at a daily level per dimension and dimension value.

Projected KPIs (NEW):
* Adds a new folder named projected within KPIs. This includes a simple Readme.
* Adds a default dimension dates model that ranges from the past 3 complete months to the end of the current month, that is in the future.
* Adds 2 projected models for created bookings: The daily one, that handles the logic of the projection (same month + last 7 days) and the monthly one, that retrieves the end of month projected value as well as some indicators of how good or bad this was for previous months.

How good is the projection at 11th of March (so 10 days in the month)?

`select * from intermediate.int_kpis_projected__agg_monthly_created_bookings order by actual_created_bookings desc`

![image.png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/4665/attachments/image.png)

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #28251
This commit is contained in:
Oriol Roqué Paniagua 2025-03-11 14:34:30 +00:00
parent a0b2f9b657
commit 9ca9c827fe
7 changed files with 899 additions and 0 deletions

View file

@ -0,0 +1,23 @@
# Projected KPIs Readme
*(!) If you are reading this, be warned that any model within this folder contains **ESTIMATES** in the form of projections.*
## Purpose
The `\kpis\projected` folder is dedicated to projected KPIs modelisation, which include mostly any relevant dimension and measure and time aggregation needed for transforming data into business estimates.
These are likely depending on the actual KPIs.
## Convention
- Any model within the folder `intermediate\kpis\projected` needs to follow this convention: `int_kpis_projected__{structure_type}_{time_dimension}_{relevant_entity_name}`.
For the structure type, time dimension and relevant entity name, please refer to the documentation within KPIs - it remains the same.
## Contributing guidelines
**Always KISS - Keep It Simple Stupid**: An easier model is better for debugging. If we're in need to do advanced predictions, then likely we'll need to do Machine Learning rather than just business rules.
**Always report the error**: There's multiple ways to check how good or bad is an estimate, but at least provide a simple measure for each estimate. Make sure that this estimate is based on a recent period from which we are able to retrieve the actual data (the source of truth). Make sure that this estimate can be computed in the past without any data leakage - otherwise Uri will be very, very mad.
Have fun!

View file

@ -0,0 +1,199 @@
{{
config(
materialized="table",
unique_key=["date", "dimension", "dimension_value"],
)
}}
with
int_kpis_projected__dimension_dates as (
select * from {{ ref("int_kpis_projected__dimension_dates") }}
),
int_kpis__agg_daily_created_bookings as (
select * from {{ ref("int_kpis__agg_daily_created_bookings") }}
),
same_month_trend as (
-- Computes the daily bookings based on the same month trend.
-- If the latest available date is the 20th of the month, then the daily
-- bookings will be computed from the 1st to the 20th of the month (so 20 days
-- in total).
select
dd.last_day_month,
mm.dimension,
mm.dimension_value,
coalesce(
sum(
case
when dd.is_available_for_same_month_projection
then coalesce(mm.created_bookings, 0)
else 0
end
),
0
) as same_month_trend_total_created_bookings,
coalesce(
1.0 * sum(
case
when dd.is_available_for_same_month_projection
then coalesce(mm.created_bookings, 0)
else 0
end
)
/ nullif(
max(
case
when dd.is_available_for_same_month_projection
then extract(day from dd.date)
else null
end
),
0
),
0
) as same_month_trend_daily_created_bookings,
max(
case
when dd.is_available_for_same_month_projection
then extract(day from dd.date)
else null
end
) as same_month_trend_total_available_days
from int_kpis_projected__dimension_dates dd
inner join int_kpis__agg_daily_created_bookings mm on dd.date = mm.date
group by 1, 2, 3
),
last_7_days_trend as (
-- Computes the daily bookings based on the last 7 days trend.
-- If the latest available date is the 20th of the month, then the daily
-- bookings will be computed from the 13th to the 20th of the month (so 7 days
-- in total).
select
dd.last_day_month,
mm.dimension,
mm.dimension_value,
coalesce(
sum(
case
when dd.is_available_for_same_month_projection
then coalesce(mm.created_bookings, 0)
else 0
end
),
0
) as last_7_days_trend_total_created_bookings,
coalesce(
1.0 * sum(
case
when dd.is_available_for_same_month_projection
then coalesce(mm.created_bookings, 0)
else 0
end
)
/ 7,
0
) as last_7_days_trend_daily_created_bookings,
7 as last_7_days_trend_total_available_days
from int_kpis_projected__dimension_dates dd
inner join
int_kpis__agg_daily_created_bookings mm
on mm.date between dd.previous_6_days and dd.date
where is_available_for_last_7_days_projection = true
group by 1, 2, 3
),
combination_of_trends as (
select
coalesce(smt.last_day_month, l7dt.last_day_month) as last_day_month,
coalesce(smt.dimension, l7dt.dimension) as dimension,
coalesce(smt.dimension_value, l7dt.dimension_value) as dimension_value,
-- Combination of two sources of trends: same month and last 7 days.
coalesce(
smt.same_month_trend_daily_created_bookings, 0
) as same_month_trend_daily_created_bookings,
coalesce(
l7dt.last_7_days_trend_daily_created_bookings, 0
) as last_7_days_trend_daily_created_bookings,
round(
(
coalesce(smt.same_month_trend_daily_created_bookings, 0)
+ coalesce(l7dt.last_7_days_trend_daily_created_bookings, 0)
)
/ 2,
0
) as projected_daily_created_bookings,
-- Total created bookings for the same month and last 7 days
-- for information purposes
smt.same_month_trend_total_created_bookings,
l7dt.last_7_days_trend_total_created_bookings,
-- Total available days for the same month and last 7 days
-- for information purposes
smt.same_month_trend_total_available_days,
l7dt.last_7_days_trend_total_available_days
from same_month_trend smt
full outer join
last_7_days_trend l7dt
on smt.last_day_month = l7dt.last_day_month
and smt.dimension = l7dt.dimension
and smt.dimension_value = l7dt.dimension_value
)
select
-- Primary Key --
dd.date,
cot.dimension,
cot.dimension_value,
-- Date Attributes --
dd.first_day_month,
dd.last_day_month,
dd.is_end_of_month,
dd.is_current_month,
dd.is_in_the_future,
dd.is_latest_historical_date,
-- Reporting source: actual or projected depending on whether the date is in the
-- future. This accounts for all the real historical data.
case
when not dd.is_in_the_future then 'ACTUAL' else 'PROJECTED'
end as daily_created_bookings_for_reporting_source,
case
when not dd.is_in_the_future
then coalesce(mm.created_bookings, 0)
else coalesce(cot.projected_daily_created_bookings, 0)
end as daily_created_bookings_for_reporting,
-- Evaluation source: actual or projected depending on whether the date is
-- available for the same month projection. This accounts only for the historical
-- data that is available for the same month projection. This is purely to evaluate
-- the performance of the projection.
case
when dd.is_available_for_same_month_projection then 'ACTUAL' else 'PROJECTED'
end as daily_created_bookings_for_evaluation_source,
case
when dd.is_available_for_same_month_projection
then coalesce(mm.created_bookings, 0)
else coalesce(cot.projected_daily_created_bookings, 0)
end as daily_created_bookings_for_evaluation,
-- Specific daily created bookings trends
coalesce(
cot.projected_daily_created_bookings, 0
) as projected_daily_created_bookings,
coalesce(mm.created_bookings, 0) as actual_daily_created_bookings,
-- For information purposes to debug the trends
cot.same_month_trend_daily_created_bookings,
cot.last_7_days_trend_daily_created_bookings,
cot.same_month_trend_total_created_bookings,
cot.last_7_days_trend_total_created_bookings,
cot.same_month_trend_total_available_days,
cot.last_7_days_trend_total_available_days
from int_kpis_projected__dimension_dates dd
left join combination_of_trends cot on dd.last_day_month = cot.last_day_month
left join
int_kpis__agg_daily_created_bookings mm
on dd.date = mm.date
and cot.dimension = mm.dimension
and cot.dimension_value = mm.dimension_value

View file

@ -0,0 +1,86 @@
{{
config(
materialized="table", unique_key=["end_date", "dimension", "dimension_value"]
)
}}
with
monthly_aggregation as (
select
first_day_month,
last_day_month,
dimension,
dimension_value,
is_current_month,
-- Metrics --
sum(
daily_created_bookings_for_reporting
) as current_month_projected_created_bookings,
sum(
daily_created_bookings_for_evaluation
) as historical_projected_created_bookings,
sum(actual_daily_created_bookings) as actual_created_bookings
from {{ ref("int_kpis_projected__agg_daily_created_bookings") }}
group by 1, 2, 3, 4, 5
),
monthly_error_computation as (
select
first_day_month,
last_day_month,
dimension,
dimension_value,
current_month_projected_created_bookings,
historical_projected_created_bookings,
actual_created_bookings,
case
when not is_current_month
then
abs(historical_projected_created_bookings - actual_created_bookings)
else null
end as monthly_absolute_error,
case
when not is_current_month
then
1.0 * abs(
historical_projected_created_bookings - actual_created_bookings
)
/ nullif(historical_projected_created_bookings, 0)
else null
end as monthly_absolute_percentage_error
from monthly_aggregation
where not is_current_month
),
historical_errors as (
select
dimension,
dimension_value,
avg(monthly_absolute_error) as historical_monthly_mean_absolute_error,
avg(
monthly_absolute_percentage_error
) as historical_monthly_mean_absolute_percentage_error
from monthly_error_computation
group by 1, 2
)
select
-- Date Information --
current_month.first_day_month as start_date,
current_month.last_day_month as end_date,
-- Primary Key --
current_month.dimension,
current_month.dimension_value,
-- Metrics --
current_month.current_month_projected_created_bookings,
current_month.actual_created_bookings,
-- Error Attribution --
historical_errors.historical_monthly_mean_absolute_error,
historical_errors.historical_monthly_mean_absolute_percentage_error
from monthly_aggregation as current_month
left join
historical_errors
on current_month.dimension = historical_errors.dimension
and current_month.dimension_value = historical_errors.dimension_value
where current_month.is_current_month

View file

@ -0,0 +1,66 @@
with
int_kpis__dimension_dates as (select * from {{ ref("int_kpis__dimension_dates") }}),
last_available_date as (
select max(date) as last_date from int_kpis__dimension_dates
),
dates_to_project as (
select
generate_series(
-- Start from 3 months before the last available date
(
select date_trunc('month', last_date)::date - interval '3 months'
from last_available_date
),
-- End at the last date of the current month
(
select
date_trunc('month', last_date)::date
+ interval '1 month'
- interval '1 day'
from last_available_date
),
'1 day'::interval
)::date as date
)
select
dtp.date,
date_trunc('month', dtp.date)::date as first_day_month,
(date_trunc('month', dtp.date)::date + interval '1 month' - interval '1 day')::date
as last_day_month,
(dtp.date - interval '6 days')::date as previous_6_days,
case
when
dtp.date
= date_trunc('month', dtp.date)::date
+ interval '1 month'
- interval '1 day'
then true
else false
end as is_end_of_month,
case
when
date_trunc('month', dtp.date)::date
= date_trunc('month', lad.last_date)::date
then true
else false
end as is_current_month,
case when dtp.date > lad.last_date then true else false end as is_in_the_future,
case
when dtp.date = lad.last_date then true else false
end as is_latest_historical_date,
case
when
extract('day' from dtp.date) <= extract('day' from lad.last_date)
and dtp.date <= lad.last_date
then true
else false
end as is_available_for_same_month_projection,
case
when
extract('day' from dtp.date) = extract('day' from lad.last_date)
and dtp.date <= lad.last_date
then true
else false
end as is_available_for_last_7_days_projection
from dates_to_project dtp
cross join last_available_date lad

View file

@ -0,0 +1,436 @@
version: 2
models:
- name: int_kpis_projected__dimension_dates
description: |
This model provides the daily time dimensionality needed
for the projection of KPIs.
It considers:
- Dates from the current month, up to the end of it,
indistinctly if it's in the future or not.
- Dates from the 3 past complete months.
This model is intended to be used as a dimension table for
the projection of KPIs, thus it's expected to be joined with
the fact table containing the KPIs to be projected.
columns:
- name: date
data_type: date
description: Specific date. It's the primary key of this model.
data_tests:
- unique
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month corresponding to the date field.
data_tests:
- not_null
- name: last_day_month
data_type: date
description: |
Last day of the month corresponding to the date field.
data_tests:
- not_null
- name: previous_6_days
data_type: date
description: Date of the previous 6 days with respect to the date field.
data_tests:
- unique
- not_null
- name: is_end_of_month
data_type: boolean
description: True if it's end of month, false otherwise.
data_tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
True if the date is within the current month, false otherwise.
data_tests:
- not_null
- name: is_in_the_future
data_type: boolean
description: |
True if the date is in the future, false otherwise. Note that is in the
future will also include the current day, as we can only consider full-closed
data up to yesterday.
data_tests:
- not_null
- name: is_latest_historical_date
data_type: boolean
description: |
True if the date is the latest historical date, false otherwise.
The latest historical date is the latest date from which we have full information.
This should be yesterday.
data_tests:
- not_null
- name: is_available_for_same_month_projection
data_type: boolean
description: |
True if the date is available for the projection of KPIs for the same month,
false otherwise. This will be true for all days contained within the first day
of the month up to yesterday. This will apply for both the current month and
the 3 months.
If today is 10th of May, and data is available up to 9th of May, then all days
from 1st to 9th of May, April, March and February will be available for projection.
If today is 1st of May, then no dates will be available for projection.
data_tests:
- not_null
- name: is_available_for_last_7_days_projection
data_type: boolean
description: |
True if the date is available for the projection of KPIs for the past 7 days,
false otherwise. This will be true only for yesterday and the equivalent day for
the previous 3 months.
If today is 10th of May, and data is available up to 9th of May, then all days
only 9th of May, 9th of April, 9th of March and 9th of February will be available
for projection.
If today is 1st of May, then 30th of April, 31st of March, 28th of February and
31st of January will be available for projection.
data_tests:
- not_null
- name: int_kpis_projected__agg_daily_created_bookings
description: |
This model provides the projected daily created bookings.
It considers 2 computations:
- The daily created bookings for the current month,
- The daily created bookings in the past 7 days,
and the final value is an arithmetic mean of both.
This model also retrieves the actual created bookings to be able
to compare the projected values with the actual ones.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- dimension
- dimension_value
columns:
- name: date
data_type: date
description: |
The start and end date of the time range considered for the
metrics in this record.
data_tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
data_tests:
- assert_dimension_completeness:
metric_column_names:
- actual_daily_created_bookings
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- by_business_scope
- by_deal
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
data_tests:
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month corresponding to the date field.
data_tests:
- not_null
- name: last_day_month
data_type: date
description: |
Last day of the month corresponding to the date field.
data_tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: True if it's end of month, false otherwise.
data_tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
True if the date is within the current month, false otherwise.
data_tests:
- not_null
- name: is_in_the_future
data_type: boolean
description: |
True if the date is in the future, false otherwise. Note that is in the
future will also include the current day, as we can only consider full-closed
data up to yesterday.
data_tests:
- not_null
- name: is_latest_historical_date
data_type: boolean
description: |
True if the date is the latest historical date, false otherwise.
The latest historical date is the latest date from which we have full information.
This should be yesterday.
data_tests:
- not_null
- name: daily_created_bookings_for_reporting_source
data_type: string
description: |
The source of the daily created bookings for reporting.
This field is used to identify the source of the data displayed
in daily_created_bookings_for_reporting to differentiate between
the actual and projected values.
It's aimed for reforting purposes as any historical month will
contain the actual figures.
data_tests:
- not_null
- accepted_values:
values:
- ACTUAL
- PROJECTED
- name: daily_created_bookings_for_reporting
data_type: integer
description: |
The daily created bookings for reporting purposes.
This field contains both the actual and projected values.
Any date in the future will contain projected values, while
any date in the past will contain actual values.
data_tests:
- not_null
- name: daily_created_bookings_for_evaluation_source
data_type: string
description: |
Important: This field is used to evaluate the performance
of the projections!
The source of the daily created bookings for evaluation.
This field is used to identify the source of the data displayed
in daily_created_bookings_for_evaluation to differentiate between
the actual and projected values.
It's aimed for evaluation purposes as any historical month can
contain projected figures.
data_tests:
- not_null
- accepted_values:
values:
- ACTUAL
- PROJECTED
- name: daily_created_bookings_for_evaluation
data_type: integer
description: |
Important: This field is used to evaluate the performance
of the projections!
The daily created bookings for evaluation purposes.
This field contains both the actual and projected values.
Any date in the future will contain projected values. Any date
in the past which day is after the yesterday day will also contain
projected values. Any date in the past which day is before the
yesterday day will contain actual values.
data_tests:
- not_null
- name: projected_daily_created_bookings
data_type: integer
description: |
The projected daily created bookings. This field is the result
of the projection of the daily created bookings for the current month
and the past 7 days.
data_tests:
- not_null
- name: actual_daily_created_bookings
description: |
The actual created bookings for the same period as the projected ones.
This comes from the standard KPIs.
data_tests:
- not_null
- name: same_month_trend_daily_created_bookings
data_type: float
description: |
The average daily created bookings for the current month.
This field is the result of the division of the actual daily created
bookings to date by the number of days available within the current month
to date, and contains decimals.
This is just for information purposes.
data_tests:
- not_null
- name: last_7_days_trend_daily_created_bookings
data_type: float
description: |
The average daily created bookings for the past 7 days.
This field is the result of the division of the actual daily created
bookings for the past 7 days by 7 days, and contains decimals.
This is just for information purposes.
data_tests:
- not_null
- name: same_month_trend_total_created_bookings
data_type: integer
description: |
The total created bookings for the current month.
This field is the result of the sum of the actual daily created
bookings to date.
This is just for information purposes.
- name: last_7_days_trend_total_created_bookings
data_type: integer
description: |
The total created bookings for the past 7 days.
This field is the result of the sum of the actual daily created
bookings for the past 7 days.
This is just for information purposes.
- name: same_month_trend_total_available_days
data_type: integer
description: |
The total available days for the current month.
This field is the result of the count of the days available within
the current month to date.
This is just for information purposes.
- name: last_7_days_trend_total_available_days
data_type: integer
description: |
The total available days for the past 7 days.
This field is the result of the count of the days available for the
past 7 days.
This is just for information purposes.
- name: int_kpis_projected__agg_monthly_created_bookings
description: |
This model provides the projected monthly created bookings per dimension
and dimension value. It only considers the current month.
Historical data is considered only to assess the performance of the
projections.
The projection logic is handled on the equivalent daily model,
please refer to it for more information.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- dimension
- dimension_value
columns:
- name: start_date
data_type: date
description: |
The start date of the time range considered for the metrics in this record.
data_tests:
- not_null
- name: end_date
data_type: date
description: |
The end date of the time range considered for the metrics in this record.
data_tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
data_tests:
- assert_dimension_completeness:
metric_column_names:
- actual_created_bookings
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- by_business_scope
- by_deal
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
data_tests:
- not_null
- name: current_month_projected_created_bookings
data_type: integer
description: |
The projected monthly created bookings for the current month.
This field is the result of the sum of the actual daily created bookings
for the current month to date and the projected daily created bookings
for the rest of the days in the month that are in the future.
The closest we are to the end of the month, the more accurate this value will be.
In order to check how good or bad the projection is, please refer to the
historical_monthly_mean_absolute_error and
historical_monthly_mean_absolute_percentage_error fields.
data_tests:
- not_null
- name: actual_created_bookings
data_type: integer
description: |
The sum of the actual daily created bookings for the current month to date.
This comes from the standard KPIs.
data_tests:
- not_null
- name: historical_monthly_mean_absolute_error
data_type: float
description: |
The mean absolute error for this dimension and dimension value.
This field is used to assess the performance of the projections.
This is based on the absolute differences between the projected
monthly created bookings for each previous month vs the actual value.
In order to be consistent, it uses the same number of days available
for the current month to date as the actual value, and the rest of the days
are projected.
This computation is applied for the past 3 months, and the value displayed
here is an average of the absolute differences.
The closest to 0, the better the projection.
- name: historical_monthly_mean_absolute_percentage_error
data_type: float
description: |
The mean absolute percentage error for this dimension and dimension value.
This field is used to assess the performance of the projections.
This is based on the absolute percentage differences between the projected
monthly created bookings for each previous month vs the actual value.
In order to be consistent, it uses the same number of days available
for the current month to date as the actual value, and the rest of the days
are projected.
This computation is applied for the past 3 months, and the value displayed
here is an average of the absolute percentage differences.
The closest to 0, the better the projection.