data-dwh-dbt-project/models/intermediate/cross/schema.yml

1566 lines
57 KiB
YAML
Raw Normal View History

2024-06-14 15:48:24 +02:00
models:
- name: int_daily_currency_exchange_rates
description: >-
This model holds a lot of data on currency exchange rates. The time
granularity is daily. Each record holds a currency pair for a specific
day, source and version.
2024-06-14 16:22:00 +02:00
Actual rates are sourced from xe.com data. The `guessed` and `forecast`
2024-06-14 15:48:24 +02:00
versions are built by simply 'pushing' the first/last exchange rate on
record. Basically, wherever we dont' have data for a date, we pick the
2024-06-14 16:22:00 +02:00
closest actual data point that comes from xe.com. Bear in mind this means
that `forecast` version records will change on a daily basis as actual
data moves forwards, meaning you shouldn't assume your money amounts
converted in the future should always stay put.
2024-06-14 15:48:24 +02:00
Note that, given the dimensionality, getting a simple time series for a
currency pair will require a bit of filtering.
Reverse rates are explicit. This means that, for any given day and any
given currency pair, you will find two records with opposite from/to
positions. So, for 2024-01-01, you will find both a EUR->USD record and a
USD->EUR record with the opposite rate (1/rate).
columns:
- name: id_exchange_rate
data_type: text
description: A unique ID for the record, derived from concatenating the
currencies, date, source and version. Currency order is relevant
(EURUSD != USDEUR).
tests:
- not_null
- unique
- name: from_currency
data_type: character
description: The source currency, represented as an ISO 4217 code.
tests:
- not_null
- name: to_currency
data_type: character
description: The target currency, represented as an ISO 4217 code.
tests:
- not_null
- name: rate
data_type: numeric
description: >-
The exchange rate, represented as the units of the target currency
that one unit of source currency gets you. So, from_currency=USD to
to_currency=PLN with rate=4.2 should be read as '1 US Dollar buys me
4.2 Polish Zlotys'.
For same currency pairs (EUR to EUR, USD to USD, etc). The rate will
always be one.
The rate can be smaller than one, but can't be negative.
tests:
- not_negative_or_zero
- not_null
- name: rate_date_utc
data_type: date
description: The date in which the rate record is relevant.
tests:
- not_null
- name: source
data_type: text
2024-06-14 16:46:28 +02:00
description:
Where is the data coming from. Records that are composed from
2024-06-14 15:48:24 +02:00
making assumptions on real data will contain `_inferred`.
- name: rate_version
data_type: text
2024-06-14 16:46:28 +02:00
description:
The version of the rate. This can be one of `actual` (the rate is a
2024-06-14 15:48:24 +02:00
reality fact), `forecast` (the rate sits in the future and is a guess
in nature) or `guess` (the rate sits in the past and is a guess in
nature). Note that one currency pair can have multiple rate versions
on the same date.
tests:
- accepted_values:
values:
- guess
- actual
- forecast
2024-06-14 16:22:00 +02:00
- not_null
2024-06-14 15:48:24 +02:00
- name: updated_at_utc
data_type: timestamp with time zone
2024-06-14 16:46:28 +02:00
description:
For external sources, this will be the point in time when the
2024-06-14 15:48:24 +02:00
information was obtained from them. For stuff we make up here in the
DWH, this will be the point in time when we made the assumption.
tests:
- not_null
2024-06-14 16:44:48 +02:00
- name: int_simple_exchange_rates
2024-06-14 16:46:28 +02:00
description: >-
A simplified vision of exchange rates, derived from
`int_daily_currency_exchange_rates`. Come here if you don't want to
understand nuances and complexities and just want to convert rates.
The time granularity is daily. Each record holds a currency pair for a
specific day. You will only find one conversion rate per currency pair and
date.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- from_currency
- to_currency
- rate_date_utc
2024-06-14 16:44:48 +02:00
columns:
- name: from_currency
data_type: character
description: The source currency, represented as an ISO 4217 code.
tests:
- not_null
- name: to_currency
data_type: character
description: The source currency, represented as an ISO 4217 code.
tests:
- not_null
- name: rate
data_type: numeric
description: The target currency, represented as an ISO 4217 code.
tests:
- not_null
- name: rate_date_utc
data_type: date
description: The date in which the rate record is relevant.
tests:
- not_null
- name: updated_at_utc
data_type: timestamp with time zone
2024-06-14 16:46:28 +02:00
description:
For external sources, this will be the point in time when the
2024-06-14 16:44:48 +02:00
information was obtained from them. For stuff we make up here in the
DWH, this will be the point in time when we made the assumption.
tests:
- not_null
- name: int_mtd_vs_previous_year_metrics
description: |
This model is used for global KPIs.
It aggregates all the mtd models with the different metrics per source
and computes any necessary weighted metric across different sources.
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
Each metric has a date, dimension and dimension value that defines
the primary key of this model.
2024-09-12 15:38:50 +02:00
Finally, it displays any metric on the current date, the previous year
date and it computes the relative increment by using the macro:
- calculate_safe_relative_increment
2024-09-12 15:38:50 +02:00
tests:
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- dimension
- dimension_value
2024-09-12 15:38:50 +02:00
columns:
- name: date
data_type: date
description: The date for the month-to-date metrics.
tests:
- not_null
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- accepted_values:
2024-09-12 15:38:50 +02:00
values:
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- global
- by_number_of_listings
Merged PR 2689: KPIs by Billing Country # Description Adds Billing Country dimension in KPIs, but does not expose them to reporting yet. Silly thing, based on the macros I built, I cannot make incremental changes unless changing all models. This will need to be adapted, happy to hear your thoughts on how we do it. Additionally, I have lack of performance of the model `mtd_guest_payments_metrics`. It takes around 5 min to execute, but technically the end-to-end runs in one shoot without breaking. It's a complex PR because it changes many files, but you will see that: * It mostly changes the join conditions for the dimensions or the schema tests, * I tried to be very careful and add things step-by-step in the commits. Goal is NOT to complete the PR yet until we see how we can improve performance. I can say though that data end-to-end looks ok to me, but would benefit from checking with production data for the new dimension Update 30th Aug * Added a new commit that includes `id_user_host` in `int_core__verification_payments`. Happy to discuss if it makes sense or not. But it changes the execution from ~600 sec to ~6 sec because it avoids a massive repeated join with `verification_requests`. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. **To check because of performance issues** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19082
2024-09-04 10:17:12 +00:00
- by_billing_country
2024-09-12 15:38:50 +02:00
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: int_dates_mtd
description: |
This model provides Month-To-Date (MTD) necessary dates for MTD-based models to work.
- For month-to-month complete information, it retrieves all end month dates that have elapsed since 2020.
- For month-to-date information, it retrieves the days of the current month of this year up to yesterday.
Additionally, it also gets the days of its equivalent month from last year previous the current day of month of today.
Example:
Imagine we have are at 4th June 2024.
- We will get the dates for 1st, 2nd, 3rd of June 2024.
- We will also get the dates for 1st, 2nd, 3rd of June 2023.
- We will get all end of months from 2020 to yesterday,
i.e., 31st January 2020, 29th February 2020, ..., 30th April 2024, 31st May 2024.
columns:
- name: year
data_type: int
description: Year number of the given date.
tests:
- not_null
- name: month
2024-09-12 15:38:50 +02:00
data_type: int
description: Month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: Day monthly number of the given date.
tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: Is end of month, 1 for yes, 0 for no.
tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
Checks if the date is within the current executed month,
1 for yes, 0 for no.
tests:
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month correspoding to the date field.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: date
data_type: date
description: |
Main date for the computation, that is used for filters.
It's the primary key for this model.
tests:
- not_null
- unique
- name: int_dates_by_deal
description: |
This model provides the necessary dates for each deal for deal-based KPIs models to work.
It only considers those dates starting from when the host user of the deal was first available.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
2024-09-12 15:38:50 +02:00
columns:
- name: year
data_type: int
description: Year number of the given date.
tests:
- not_null
- name: month
2024-09-12 15:38:50 +02:00
data_type: int
description: Month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: Day monthly number of the given date.
tests:
- not_null
- name: last_day_month
data_type: date
description: |
Last day of the month correspoding to the date field.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month correspoding to the date field.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: date
data_type: date
description: |
Main date for the computation, that is used for filters.
It's the primary key for this model.
tests:
- not_null
- name: id_deal
data_type: string
description: |
Main identifier of the B2B clients. A deal can have multiple hosts.
A host should usually have a deal, but it does not happen on all cases.
In this KPI reporting we force that Deal is not null to avoid potential
data quality issues.
tests:
- not_null
- name: main_deal_name
data_type: string
description: |
Main name for this ID deal.
tests:
- not_null
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
ISO 3166-1 alpha-3 main country code in which the Deal is billed.
In some cases it's null.
- name: int_mtd_aggregated_metrics
description: |
The `int_mtd_aggregated_metrics` model aggregates multiple metrics on a year, month, and day basis.
The primary source of data is the `int_mtd_vs_previous_year_metrics` model, which contain the combination
of metrics data per source. This model just changes the display format to unpivot the information into
a set of metric, value, previous_year_value and relative_increment at a given date. It uses Jinja
code to avoid code replication.
2024-09-12 15:38:50 +02:00
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- metric
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- dimension
- dimension_value
2024-09-12 15:38:50 +02:00
columns:
- name: year
data_type: int
description: year number of the given date.
tests:
- not_null
- name: month
2024-09-12 15:38:50 +02:00
data_type: int
description: month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: day monthly number of the given date.
tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: is end of month, 1 for yes, 0 for no.
tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
checks if the date is within the current executed month,
1 for yes, 0 for no.
tests:
- not_null
- name: first_day_month
data_type: date
description: |
first day of the month correspoding to the date field.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: date
data_type: date
description: |
main date for the computation, that is used for filters.
It comes from int_dates_mtd logic.
tests:
- not_null
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- accepted_values:
2024-09-12 15:38:50 +02:00
values:
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- global
- by_number_of_listings
Merged PR 2689: KPIs by Billing Country # Description Adds Billing Country dimension in KPIs, but does not expose them to reporting yet. Silly thing, based on the macros I built, I cannot make incremental changes unless changing all models. This will need to be adapted, happy to hear your thoughts on how we do it. Additionally, I have lack of performance of the model `mtd_guest_payments_metrics`. It takes around 5 min to execute, but technically the end-to-end runs in one shoot without breaking. It's a complex PR because it changes many files, but you will see that: * It mostly changes the join conditions for the dimensions or the schema tests, * I tried to be very careful and add things step-by-step in the commits. Goal is NOT to complete the PR yet until we see how we can improve performance. I can say though that data end-to-end looks ok to me, but would benefit from checking with production data for the new dimension Update 30th Aug * Added a new commit that includes `id_user_host` in `int_core__verification_payments`. Happy to discuss if it makes sense or not. But it changes the execution from ~600 sec to ~6 sec because it avoids a massive repeated join with `verification_requests`. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. **To check because of performance issues** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19082
2024-09-04 10:17:12 +00:00
- by_billing_country
2024-09-12 15:38:50 +02:00
Merged PR 2607: Propagates and exposes multiple dimension handling for KPIs # Description This PR ensures the propagation of the dimensions for KPIs across the key aggregating and exposing models. Additionally, provides these 2 new fields in reporting while **not affecting the current data display**, thus it's safe to work in the PBI report without needing to work in 2 PRs in parallel. **Changes:** **1 - Intermediate, `int_mtd_vs_previous_year_metrics`:** * Removes the temporary filter on `where dimension in ({{ production_dimensions }})`. This will be applied directly to reporting later. This ensures that the new dimension on customer segmentation is fully available only within intermediate. * Adds `dimension` and `dimension_value` granularity. This includes: 1) adding these fields, 2) joining by these fields with all the source CTEs containing the source models with metrics - which in turn needs the change of the dates model - and 3) joining by these fields in the self-join to compute the incremental vs. previous year. * Changes on the schema file **2 - Intermediate, `int_mtd_aggregated_metrics`:** * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields. * Changes on the schema file **3 - Reporting, `mtd_aggregated_metrics`:** * Adds the filter removed on `int_mtd_vs_previous_year_metrics`. This ensures that only the Global dimension is available for the reporting, thus **no changes from user POV**. * Adds `dimension` and `dimension_value` granularity. This includes only adding these fields * Changes on the schema file # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19325
2024-08-20 15:42:27 +00:00
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: previous_year_date
data_type: date
description: |
corresponds to the date of the previous year, with respect to the field date.
It comes from int_dates_mtd logic. It's only displayed for information purposes,
should not be needed for reporting.
- name: metric
data_type: text
description: name of the business metric.
tests:
- not_null
- name: order_by
data_type: integer
description: |
order for displaying purposes. Null values are accepted, but keep
in mind that then there's no default controlled display order.
- name: number_format
data_type: text
description: allows for grouping and formatting for displaying purposes.
tests:
2024-09-12 15:38:50 +02:00
- accepted_values:
2024-09-20 14:53:43 +02:00
values:
[
"integer",
"percentage",
"currency_gbp",
"converted_metric_currency_gbp",
]
- name: value
2024-09-12 15:38:50 +02:00
data_type: numeric
description: |
numeric value (integer or decimal) that corresponds to the MTD computation of the metric
at a given date.
- name: previous_year_value
2024-09-12 15:38:50 +02:00
data_type: numeric
description: |
numeric value (integer or decimal) that corresponds to the MTD computation of the metric
on the previous year at a given date.
- name: relative_increment
2024-09-12 15:38:50 +02:00
data_type: numeric
description: |
numeric value that corresponds to the relative increment between value and previous year value,
following the computation: value / previous_year_value - 1.
- name: relative_increment_with_sign_format
2024-09-12 15:38:50 +02:00
data_type: numeric
description: |
relative_increment value multiplied by -1 in case this metric's growth doesn't have a
positive impact for Superhog, otherwise is equal to relative_increment.
This value is specially created for formatting in PBI
- name: int_monthly_aggregated_metrics_history_by_deal
description: |
This model aggregates the monthly historic information regarding the different metrics computed
at deal level. The primary sources of data are the `int_yyy__monthly_XXXXX_history_by_deal`
models which contain the raw metrics data per source.
Unlike the int_mtd_aggregated_metrics, this model does not abstract each metric, since
no comparison versus last year is performed. In short, it just gathers the information stored
in the abovementioned models.
To keep in mind: aggregating the information of this model will not necessarily result into
the int_mtd_aggregated metrics because 1) the mtd version contains more computing dates
than the by deal version, the latest being a subset of the first, and 2) the deal based model
enforces that a booking/guest journey/listing/etc has a host with a deal assigned, which is
2024-09-12 15:38:50 +02:00
not necessarily the case.
2024-09-12 15:38:50 +02:00
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
columns:
- name: date
data_type: date
description: The last day of the month or yesterday for historic metrics.
tests:
- not_null
- name: id_deal
data_type: character varying
2024-09-12 15:38:50 +02:00
description: Id of the deal associated to the host.
tests:
- not_null
- name: main_deal_name
data_type: string
description: |
Main name for this ID deal.
tests:
- not_null
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
ISO 3166-1 alpha-3 main country code in which the Deal is billed.
In some cases it's null.
2024-09-12 15:38:50 +02:00
- name: int_dates_mtd_by_dimension
description: |
This model provides Month-To-Date (MTD) necessary dates, dimension and dimension_values
for MTD-based models to work.
It provides the basic "empty" structure from which metrics will be built upon. This is, on
top of the Date that characterises int_dates_mtd, including the dimensions and their
respective values that should appear in any mtd metric model.
2024-09-12 15:38:50 +02:00
Example:
- For the "global" dimension, we will only have the "global" dimension value.
- For the "by_number_of_listing" dimension, we will have different values
according to the segments defined, ex: 0, 1-5, 6-20, etc.
... and so on and forth for any available dimension. These combinations should appear
for each date of the MTD models.
2024-09-12 15:38:50 +02:00
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- dimension
- dimension_value
columns:
- name: year
data_type: int
description: Year number of the given date.
tests:
- not_null
- name: month
2024-09-12 15:38:50 +02:00
data_type: int
description: Month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: Day monthly number of the given date.
tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: Is end of month, 1 for yes, 0 for no.
tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
Checks if the date is within the current executed month,
1 for yes, 0 for no.
tests:
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month correspoding to the date field.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: date
data_type: date
description: |
Main date for the computation, metrics include monthly information
2024-09-12 15:38:50 +02:00
until this date.
tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- accepted_values:
2024-09-12 15:38:50 +02:00
values:
- global
- by_number_of_listings
Merged PR 2689: KPIs by Billing Country # Description Adds Billing Country dimension in KPIs, but does not expose them to reporting yet. Silly thing, based on the macros I built, I cannot make incremental changes unless changing all models. This will need to be adapted, happy to hear your thoughts on how we do it. Additionally, I have lack of performance of the model `mtd_guest_payments_metrics`. It takes around 5 min to execute, but technically the end-to-end runs in one shoot without breaking. It's a complex PR because it changes many files, but you will see that: * It mostly changes the join conditions for the dimensions or the schema tests, * I tried to be very careful and add things step-by-step in the commits. Goal is NOT to complete the PR yet until we see how we can improve performance. I can say though that data end-to-end looks ok to me, but would benefit from checking with production data for the new dimension Update 30th Aug * Added a new commit that includes `id_user_host` in `int_core__verification_payments`. Happy to discuss if it makes sense or not. But it changes the execution from ~600 sec to ~6 sec because it avoids a massive repeated join with `verification_requests`. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. **To check because of performance issues** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19082
2024-09-04 10:17:12 +00:00
- by_billing_country
2024-09-12 15:38:50 +02:00
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
2024-09-12 15:38:50 +02:00
- not_null
- name: int_monthly_growth_score_by_deal
description: |
The main goal of this model is to provide a growth score by deal and month.
The idea behind it is that each deal will have some business performance
associated to it over the months, and that comparing how it is currently
performing vs. historical data we can determine whether the tendency is to
grow or to decay. This is specially useful for AMs to focus their effort
towards the clients that have a negative tendency.
The computation of the growth score is based on 3 main indicators:
- Created bookings
- Listings booked in month
- Total revenue (in gbp)
The main idea is, for each deal, to compare each of these metrics by
checking the latest monthly value vs. 1) the monthly value of the equivalent
month on the previous year and 2) the monthly value of the previous month
- in other words, a year-on-year (YoY) and month-on-month (MoM) comparison.
We do this comparison by doing a relative incremental.
The growth score is computed then by averaging the outcome of the 6 scores.
Lastly, in order to provide a prioritisation sense, we have a weighted growth
score that results from the multiplication of the growth score per the revenue
weight a specific deal has provided in the previous 12 months.
However, this is not strictly true for Revenue because 1) we have an invoicing
delay and 2) in some cases, monthly revenue per deal can be negative. In this
specific cases, the YoY comparison is shifted by one month, and an effective
revenue value for the revenue share is computed, that cannot be lower than 0.
In order to keep both a properly set up score and revenue consistency, both
a real revenue value and effective revenue value are present in this model,
while no MoM or YoY value is computed if negative revenue is found.
Lastly, this model provides informative date fields, deal attributes, absolute
metric values and MoM & YoY relative incrementals to enrich reporting.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
columns:
- name: date
data_type: date
description: |
Date corresponding to the last day of the month. Given month
metrics are inclusive to this date. Together with id_deal, it
acts as the primary key of this model.
tests:
- not_null
- name: id_deal
data_type: string
description: |
Unique identifier of a Deal. Together with date, it acts as
the primary key of this model.
tests:
- not_null
- name: main_deal_name
data_type: string
description: |
Main name for a Deal, representing the client.
tests:
- not_null
- name: main_billing_country_iso_3_per_deal
data_type: string
description: |
Main billing country for this client. In some cases
it can be null.
- name: deal_lifecycle_state
data_type: string
description: |
Identifier of the lifecycle state of a given deal
in a given month.
- name: deal_hubspot_stage
data_type: string
description: |
Current hubspot stage for a given deal.
- name: account_manager
data_type: string
description: |
Current Account Manager in charge of a given deal, according
to Hubspot.
- name: live_date_utc
data_type: date
description: |
Date in which the account has gone live, according to Hubspot.
- name: cancellation_date_utc
data_type: date
description: |
Date in which the account has been offboarded, according to
Hubspot.
- name: given_month_first_day_month
data_type: date
description: |
Informative field. It indicates the first day of the
month corresponding to date.
If date = 2024-09-30, this field will be 2024-09-01.
tests:
- not_null
- name: previous_1_month_first_day_month
data_type: date
description: |
Informative field. It indicates the first day of the
previous month with respect to date.
If date = 2024-09-30, this field will be 2024-08-01.
It can be null if no previous history for that
deal is found.
- name: previous_2_month_first_day_month
data_type: date
description: |
Informative field. It indicates the first day of the
month 2 months before with respect to date.
If date = 2024-09-30, this field will be 2024-07-01.
It can be null if no previous history for that
deal is found.
- name: previous_12_month_first_day_month
data_type: date
description: |
Informative field. It indicates the first day of the
month with respect to date, but on the previous year.
If date = 2024-09-30, this field will be 2023-09-01.
It can be null if no previous history for that
deal is found.
- name: previous_13_month_first_day_month
data_type: date
description: |
Informative field. It indicates the first day of the
previous month with respect to date, but on the previous year.
If date = 2024-09-30, this field will be 2023-08-01.
It can be null if no previous history for that
deal is found.
- name: aggregated_revenue_from_first_day_month
data_type: date
description: |
Informative field. It indicates the first day of the
month from the lower bound range in which the revenue
aggregation is computed.
The aggregation uses the previous 12 months in which we
know the revenue, thus:
If date = 2024-09-30, this field will be 2023-09-01.
It can be null if no previous history for that
deal is found.
- name: aggregated_revenue_to_first_day_month
data_type: date
description: |
Informative field. It indicates the first day of the
month from the upper bound range in which the revenue
aggregation is computed.
The aggregation uses the previous 12 months in which we
know the revenue, thus:
If date = 2024-09-30, this field will be 2023-08-01.
It can be null if no previous history for that
deal is found.
- name: given_month_revenue_in_gbp
data_type: decimal
description: |
Monthly value representing revenue in GBP
for a specific deal. This value corresponds to
the given month. This value can be negative,
but not null.
tests:
- not_null
- name: previous_1_month_revenue_in_gbp
data_type: decimal
description: |
Monthly value representing revenue in GBP
for a specific deal. This value corresponds to
the previous month.
This value can be negative.
This value can be null, thus indicating that no
history is available.
- name: previous_2_month_revenue_in_gbp
data_type: decimal
description: |
Monthly value representing revenue in GBP
for a specific deal. This value corresponds to
the monthly amount generated 2 months ago
This value can be negative.
This value can be null, thus indicating that no
history is available.
- name: previous_12_month_revenue_in_gbp
data_type: decimal
description: |
Monthly value representing revenue in GBP
for a specific deal. This value corresponds to
the monthly amount generated 12 months ago.
This value can be negative.
This value can be null, thus indicating that no
history is available.
- name: previous_13_month_revenue_in_gbp
data_type: decimal
description: |
Monthly value representing revenue in GBP
for a specific deal. This value corresponds to
the monthly amount generated 13 months ago.
This value can be negative.
This value can be null, thus indicating that no
history is available.
- name: mom_revenue_growth
data_type: decimal
description: |
Relative increment of the revenue generated in the
current month with respect to the one generated in
the previous month.
It can be null if any revenue used in the computation
is null or it's negative.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: mom_1_month_shift_revenue_growth
data_type: decimal
description: |
Relative increment of the revenue generated in the
previous month with respect to the one generated 2
months ago.
It can be null if any revenue used in the computation
is null or it's negative.
This field is used for the growth score computation.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: yoy_revenue_growth
data_type: decimal
description: |
Relative increment of the revenue generated in the
current month with respect to the one generated 12
months ago.
It can be null if any revenue used in the computation
is null or it's negative.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: yoy_1_month_shift_revenue_growth
data_type: decimal
description: |
Relative increment of the revenue generated in the
previous month with respect to the one generated 13
months ago.
It can be null if any revenue used in the computation
is null or it's negative.
This field is used for the growth score computation.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: given_month_created_bookings
data_type: integer
description: |
Monthly value representing created bookings
for a specific deal. This value corresponds to
the given month. This value cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: previous_1_month_created_bookings
data_type: integer
description: |
Monthly value representing created bookings
for a specific deal. This value corresponds to
the previous month.
This value can be null, thus indicating that no
history is available.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: previous_12_month_created_bookings
data_type: integer
description: |
Monthly value representing created bookings
for a specific deal. This value corresponds to
monthly amount generated 12 months ago.
This value can be null, thus indicating that no
history is available.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: mom_created_bookings_growth
data_type: decimal
description: |
Relative increment of the bookings created in the
current month with respect to the ones created in
the previous month.
It can be null if the bookings created in the
previous month are null.
This field is used for the growth score computation.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: yoy_created_bookings_growth
data_type: decimal
description: |
Relative increment of the bookings created in the
current month with respect to the ones created 12
months ago.
It can be null if the bookings created 12 months
ago are null.
This field is used for the growth score computation.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: given_month_listings_booked_in_month
data_type: integer
description: |
Monthly value representing the listings booked in month
for a specific deal. This value corresponds to
the given month. This value cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: previous_1_month_listings_booked_in_month
data_type: integer
description: |
Monthly value representing the listings booked in month
for a specific deal. This value corresponds to
the previous month.
This value can be null, thus indicating that no
history is available.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: previous_12_month_listings_booked_in_month
data_type: integer
description: |
Monthly value representing the listings booked in month
for a specific deal. This value corresponds to
monthly amount generated 12 months ago.
This value can be null, thus indicating that no
history is available.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: mom_listings_booked_in_month_growth
data_type: decimal
description: |
Relative increment of the the listings booked in month
in the current month with respect to the ones of
the previous month.
It can be null if the listings booked in month in the
previous month are null.
This field is used for the growth score computation.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: yoy_listings_booked_in_month_growth
data_type: decimal
description: |
Relative increment of the listings booked in month
in the current month with respect to the ones of 12
months ago.
It can be null if the listings booked in month of 12
months ago are null.
This field is used for the growth score computation.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: -1
strictly: false
- name: deal_revenue_12_months_window
data_type: decimal
description: |
Total aggregated revenue in GBP generated by a deal
in the months from the period ranging from the
aggregated_revenue_from_first_day_month to
aggregated_revenue_to_first_day_month.
It can be negative if the sum is negative.
It cannot be null.
tests:
- not_null
- name: effective_deal_revenue_12_months_window
data_type: decimal
description: |
Effective aggregated revenue in GBP generated by a deal
in the months from the period ranging from the
aggregated_revenue_from_first_day_month to
aggregated_revenue_to_first_day_month.
All negative monthly revenue values are settled as 0,
thus this value should not be reported.
It is used for the deal contribution share with respect
to the global revenue. It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: effective_global_revenue_12_months_window
data_type: decimal
description: |
Effective aggregated revenue in GBP generated by all deals
in the months from the period ranging from the
aggregated_revenue_from_first_day_month to
aggregated_revenue_to_first_day_month.
All negative monthly revenue values are settled as 0,
thus this value should not be reported.
It is used for the deal contribution share with respect
to the global revenue. It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: deal_contribution_share_to_global_revenue
data_type: decimal
description: |
Represents the size of the deal in terms of revenue. In
other words, what's the percentage of the global revenue
that can be attributed to this deal. It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: deal_contribution_rank_to_global_revenue
data_type: integer
description: |
Represents the ordered list of deals by descending size
in terms of revenue.
If more than one deal have the same share, the order is
not under control.
It cannot be null.
tests:
- not_null
- name: deal_created_bookings_12_months_window
data_type: integer
description: |
Total created bookings generated by a deal
in the months from the period ranging from the
aggregated_revenue_from_first_day_month to
aggregated_revenue_to_first_day_month.
It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: global_created_bookings_12_months_window
data_type: integer
description: |
Total created bookings generated by any deal
in the months from the period ranging from the
aggregated_revenue_from_first_day_month to
aggregated_revenue_to_first_day_month.
It is used for the deal contribution share with respect
to the global created bookings. It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: deal_contribution_share_to_global_created_bookings
data_type: decimal
description: |
Represents the size of the deal in terms of created bookings.
In other words, what's the percentage of the global created
bookings that can be attributed to this deal.
It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: deal_contribution_rank_to_global_created_bookings
data_type: integer
description: |
Represents the ordered list of deals by descending size
in terms of created bookings.
If more than one deal have the same share, the order is
not under control.
It cannot be null.
tests:
- not_null
- name: deal_avg_listings_booked_in_month_12_months_window
data_type: decimal
description: |
Average listings booked in month by a deal
in the months from the period ranging from the
aggregated_revenue_from_first_day_month to
aggregated_revenue_to_first_day_month.
It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: global_avg_listings_booked_in_month_12_months_window
data_type: decimal
description: |
Sum of the average listings booked in month by
any deal in the months from the period ranging from the
aggregated_revenue_from_first_day_month to
aggregated_revenue_to_first_day_month.
It is used for the deal contribution share with respect
to the global average listings booked in month.
It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: deal_contribution_share_to_global_avg_listings_booked_in_month
data_type: decimal
description: |
Represents the size of the deal in terms of average listings
booked in month.
In other words, what's the percentage of the global average listings
booked in month that can be attributed to this deal.
It cannot be null.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: deal_contribution_rank_to_global_avg_listings_booked_in_month
data_type: decimal
description: |
Represents the ordered list of deals by descending size
in terms of average listings booked in month.
If more than one deal have the same share, the order is
not under control.
It cannot be null.
tests:
- not_null
- name: avg_mom_growth_score
data_type: decimal
description: |
Represents the average score of MoM growth of created
bookings, MoM growth of listings booked in month and
MoM shifted by one month of revenue.
It indicates the tendency of growth of the deal without
taking into account its revenue size. It cannot be null.
tests:
- not_null
- name: avg_yoy_growth_score
data_type: decimal
description: |
Represents the average score of YoY growth of created
bookings, YoY growth of listings booked in month and
YoY shifted by one month of revenue.
It indicates the tendency of growth of the deal without
taking into account its revenue size. It cannot be null.
tests:
- not_null
- name: avg_growth_score
data_type: decimal
description: |
Represents the average score of YoY and MoM growth of created
bookings, YoY and MoM growth of listings booked in month and
YoY and MoM shifted by one month of revenue.
It indicates the tendency of growth of the deal without
taking into account its revenue size. It cannot be null.
tests:
- not_null
- name: weighted_avg_growth_score
data_type: decimal
description: |
It's the weighted version of avg_growth_score that
takes into account the client size by using the revenue
contribution share of that deal to the global amount.
It's the main indicator towards measuring both growth
(if positive) or decay (if negative) while weighting
the financial impact this deal tendency can have.
tests:
- not_null
- name: categorisation_weighted_avg_growth_score
data_type: string
description: |
Discrete categorisation of weighted_avg_growth_score.
It helps easily identifying which accounts are top losers,
losers, flat, winners and top winners.
Currently the categorisation is based on the score itself
rather than selecting a top up/down.
tests:
- not_null
- accepted_values:
values:
- MAJOR DECLINE
- DECLINE
- FLAT
- GAIN
- MAJOR GAIN
- UNSET
Merged PR 3163: First version of 12m window contribution by deal # Description This PR creates a new model that depends on int_monthly_aggregated_metrics_history_by_deal. The idea is that this is used for Churn computation (Booking Churn, Revenue Churn, Listing Churn) later on. The idea is relatively simple. Measure how much a Deal has been contributing to a Global amount (sum of metric for all deals) over the preceding period of 12 months. You will notice that there's 2 computations, the "additive" and the "average" one. This is because we still need to align with Matt/Suzannah on which approach makes more sense, but we need data for it. I'm not sure the namings are good though so happy to see your suggestions. You will also notice that there's no filter by deal_lifecycle_state = '06-Churning'. This will be done in a separated model, whenever we attribute this model to the mtd computation. The reason is simple - this model stays at deal level, thus meaning we can do the dimension aggregation and even a lifecycle aggregation if needed, depending on the needs. Be aware that this effectively means that MTD KPIs models will depend on the "monthly by deal" models. This has some cons in terms of dependency management but cannot be overcome since we the metric total revenue depends on many subsets. In essence, I don't see another way of doing it unless doing a massive KPIs refactor. I prefer to wait until the Product KPIs discussions are finished and then we see how we approach it. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #22691
2024-10-15 06:51:41 +00:00
- name: int_monthly_12m_window_contribution_by_deal
description: |
The main goal of this model is to provide how much a deal
contributes to a given metric on the global amount over a
period of 12 months.
At the moment, this is only done for 3 metrics:
- total_revenue_in_gbp
- created_bookings
- listings_booked_in_month
The contribution is based on an Average approach:
Merged PR 3163: First version of 12m window contribution by deal # Description This PR creates a new model that depends on int_monthly_aggregated_metrics_history_by_deal. The idea is that this is used for Churn computation (Booking Churn, Revenue Churn, Listing Churn) later on. The idea is relatively simple. Measure how much a Deal has been contributing to a Global amount (sum of metric for all deals) over the preceding period of 12 months. You will notice that there's 2 computations, the "additive" and the "average" one. This is because we still need to align with Matt/Suzannah on which approach makes more sense, but we need data for it. I'm not sure the namings are good though so happy to see your suggestions. You will also notice that there's no filter by deal_lifecycle_state = '06-Churning'. This will be done in a separated model, whenever we attribute this model to the mtd computation. The reason is simple - this model stays at deal level, thus meaning we can do the dimension aggregation and even a lifecycle aggregation if needed, depending on the needs. Be aware that this effectively means that MTD KPIs models will depend on the "monthly by deal" models. This has some cons in terms of dependency management but cannot be overcome since we the metric total revenue depends on many subsets. In essence, I don't see another way of doing it unless doing a massive KPIs refactor. I prefer to wait until the Product KPIs discussions are finished and then we see how we approach it. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #22691
2024-10-15 06:51:41 +00:00
Over a period of 12 months, sum the value of a given a metric
for each deal, and divide it by the amount of months we're considering
for that deal. Sum all the average amounts per deals to get a global.
Divide the avg per deal value vs. the sum of avgs global one.
The average approach "boosts" the contribution of those accounts
that have been active for less than 12 months.
Merged PR 3163: First version of 12m window contribution by deal # Description This PR creates a new model that depends on int_monthly_aggregated_metrics_history_by_deal. The idea is that this is used for Churn computation (Booking Churn, Revenue Churn, Listing Churn) later on. The idea is relatively simple. Measure how much a Deal has been contributing to a Global amount (sum of metric for all deals) over the preceding period of 12 months. You will notice that there's 2 computations, the "additive" and the "average" one. This is because we still need to align with Matt/Suzannah on which approach makes more sense, but we need data for it. I'm not sure the namings are good though so happy to see your suggestions. You will also notice that there's no filter by deal_lifecycle_state = '06-Churning'. This will be done in a separated model, whenever we attribute this model to the mtd computation. The reason is simple - this model stays at deal level, thus meaning we can do the dimension aggregation and even a lifecycle aggregation if needed, depending on the needs. Be aware that this effectively means that MTD KPIs models will depend on the "monthly by deal" models. This has some cons in terms of dependency management but cannot be overcome since we the metric total revenue depends on many subsets. In essence, I don't see another way of doing it unless doing a massive KPIs refactor. I prefer to wait until the Product KPIs discussions are finished and then we see how we approach it. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #22691
2024-10-15 06:51:41 +00:00
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
columns:
- name: date
data_type: date
description: |
Date corresponding to the last day of the month.
Metrics are inclusive to this date. Together with id_deal, it
acts as the primary key of this model.
tests:
- not_null
- name: id_deal
data_type: string
description: |
Unique identifier of a Deal. Together with date, it acts as
the primary key of this model.
tests:
- not_null
- name: deal_lifecycle_state
data_type: string
description: |
Identifier of the lifecycle state of a given deal
in a given month.
- name: preceding_months_count_by_deal
data_type: integer
description: |
Number of months preceding to the one given by date
that are used for the historic metric retrieval for
a given deal. In essence it states the amount of
months a given deal has been active before a the month
given by date, capped at 12 months.
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 12
strictly: false
- name: has_deal_been_created_less_than_12_months_ago
data_type: boolean
description: |
Flag to identify if a given deal has been created less
than 12 months ago (true) or not (false). It's based on the
preceding_months_count_by_deal, and will be true on the first
year of deal activity.
- name: total_revenue_12m_average_contribution
data_type: numeric
description: |
Share of the deal contribution on total revenue
vs. the global amount, on the preceding 12 months
with respect to date. It uses the average approach.
It can be negative.
tests:
- not_null
- name: created_bookings_12m_average_contribution
data_type: numeric
description: |
Share of the deal contribution on created bookings
vs. the global amount, on the preceding 12 months
with respect to date. It uses the average approach.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1
strictly: false
- name: listings_booked_in_month_12m_average_contribution
data_type: numeric
description: |
Share of the deal contribution on listings booked in month
vs. the global amount, on the preceding 12 months
with respect to date. It uses the average approach.
tests:
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1
strictly: false
- name: int_monthly_churn_metrics
description: |
This model is used for global KPIs.
It computes the churn contribution by dimension, dimension value
and date, in a monthly basis. This model is different from the
usual mtd ones since it strictly depends on the monthly computation
of metrics by deal, which is done in a monthly basis rather than mtd.
In essence, it means we won't have data for the current month.
This model retrieves the 12 month contribution to global metrics
by deal and aggregates it to dimension and dimension value for those
deals that are tagged as '05-Churning' in that month. Thus, it provides
a total of 3 churn related metrics, represented as ratios over the total:
- Total Revenue (in GBP)
- Created Bookings
- Listings Booked in Month
by using the Average contribution method. For further
information, please refer to the documentation of the model:
- int_monthly_12m_window_contribution_by_deal
Lastly, when checking data at any dimension distinct from Global, at the
moment these values represent the additive contribution of churn with respect
to the global amount. This means that, for instance, if we have 10% of churn
in a month, it can be divided by 9% USA and 1% GBR since 9%+1% = 10%.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- dimension
- dimension_value
columns:
- name: date
data_type: date
description: The date for the month-to-date metrics.
tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: total_revenue_churn_average_contribution
data_type: numeric
description: Total Revenue churn rate (average approach).
- name: created_bookings_churn_average_contribution
data_type: numeric
description: Created Bookings churn rate (average approach).
- name: listings_booked_in_month_churn_average_contribution
data_type: numeric
description: Listings Booked in Month churn rate (average approach).
- name: int_mtd_deal_lifecycle
description: |
This model contains the historic information regarding the lifecycle of hosts, at deal level.
The information regarding the booking-related time allows for the current status of any
deal regarding its activity. This information is encapsulated in the following columns:
deal_lifecycle_state: contains one of the following states
- 01-New: Deals that have been created in the current month, without bookings, that are not offboarded.
- 02-Never Booked: Deals that have been created before the current month, without bookings, that are not offboarded.
- 03-First Time Booked: Deals that have been booked for the first time in the current month, that are not offboarded.
- 04-Active: Deals that have booking activity in the past 12 months (that are not FTB nor reactivated), that are not offboarded.
- 05-Churning: Either Deals that are offboarded in that month or Deals that are becoming inactive because of lack of bookings in the past 12 months
- 06-Inactive: Either Deals that have been previously offboarded or Deals that have not had a booking for more than 12 months.
- 07-Reactivated: Deals that have had a booking in the current month that were inactive or churning before, that are not offboarded.
- Finally, if none of the logic applies, which should not happen, null will be set and a dbt alert will raise.
Since the states of Active, First Time Booked and Reactivated indicate certain booking activity and are
mutually exclusive, the model also provides information of the recency of the bookings by the following
booleans:
- has_been_booked_within_current_month: If a deal has had a booking created in the current month
- has_been_booked_within_last_6_months: If a deal has had a booking created in the past 6 months
- has_been_booked_within_last_12_months: If a deal has had a booking created in the past 12 months
Note that if a deal has had a booking created in a given month, all 3 columns will be true. Similarly,
if the last booking created to a deal was 5 months ago, only the column has_been_booked_in_1_month
will be false; while the other 2 will be true.
Some final considerations:
- It's possible but not common that a Deal gets offboarded on the same month that has had some bookings created.
- It shouldn't happen that a Deal that is Inactive has some bookings created. However, there's few cases in which
this happens likely because of misconfiguration between Hubspot and Core. This should be reported to increase
data quality.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
columns:
- name: date
data_type: date
description: The date for the month-to-date. Information is inclusive to the date displayed.
tests:
- not_null
- name: id_deal
data_type: character varying
description: Id of the deal associated to the host.
tests:
- not_null
- name: creation_date_utc
data_type: date
description: Date of when the first host associated to that deal was created.
- name: first_time_booked_date_utc
data_type: date
description: |
Date of the first booking created for a given deal. Can be null if the deal
has never had a booking associated with it.
- name: last_time_booked_date_utc
data_type: date
description: |
Date of the last booking created for a given deal. Can be null if the deal
has never had a booking associated with it. Can be the same as first_time_booked_date_utc
if the deal only had 1 booking in its history.
- name: second_to_last_time_booked_date_utc
data_type: date
description: |
Date of the second-to-last booking created for a given deal, meaning the creation
date of the booking that precedes the last one. It's relevant for the reactivation computation
on the lifecycle. Can be null if the deal has never had a booking associated with it or if
the deal only had 1 booking in its history.
- name: cancellation_date_utc
data_type: date
description: |
Date of when the deal was cancelled, according to Hubspot. This is the date we're considering
for hard offboarding. It can be null, meaning the account has not been offboarded.
- name: deal_lifecycle_state
data_type: character varying
description: |
Contains the lifecycle state of a deal. The accepted values are:
01-New, 02-Never Booked, 03-First Time Booked, 04-Active, 05-Churning, 06-Inactive,
07-Reactivated. Failing to implement the logic will result in alert.
tests:
- not_null
- accepted_values:
values:
- 01-New
- 02-Never Booked
- 03-First Time Booked
- 04-Active
- 05-Churning
- 06-Inactive
- 07-Reactivated
- name: has_been_booked_within_current_month
data_type: boolean
description: If the deal has had a booking created in the current month.
- name: has_been_booked_within_last_6_months
data_type: boolean
description: If the deal has had a booking created in the past 6 months.
- name: has_been_booked_within_last_12_months
data_type: boolean
description: If the deal has had a booking created in the past 12 months.
- name: has_been_offboarded
data_type: boolean
description: If the deal has been cancelled or not.
- name: int_mtd_deal_metrics
description: |
This model contains the historic information regarding the deals in an aggregated manner.
It's used for the business KPIs. Data is aggregated at the last day of the month and in the
days necessary for the Month-to-Date computation of the current month.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- dimension
- dimension_value
columns:
- name: date
data_type: date
description: The date for the month-to-date deal-related metrics.
tests:
- not_null
- name: dimension
data_type: string
description: The dimension or granularity of the metrics.
tests:
- accepted_values:
values:
- global
- by_number_of_listings
- by_billing_country
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null