# Description Adds Billing Country dimension in KPIs, but does not expose them to reporting yet. Silly thing, based on the macros I built, I cannot make incremental changes unless changing all models. This will need to be adapted, happy to hear your thoughts on how we do it. Additionally, I have lack of performance of the model `mtd_guest_payments_metrics`. It takes around 5 min to execute, but technically the end-to-end runs in one shoot without breaking. It's a complex PR because it changes many files, but you will see that: * It mostly changes the join conditions for the dimensions or the schema tests, * I tried to be very careful and add things step-by-step in the commits. Goal is NOT to complete the PR yet until we see how we can improve performance. I can say though that data end-to-end looks ok to me, but would benefit from checking with production data for the new dimension Update 30th Aug * Added a new commit that includes `id_user_host` in `int_core__verification_payments`. Happy to discuss if it makes sense or not. But it changes the execution from ~600 sec to ~6 sec because it avoids a massive repeated join with `verification_requests`. # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [ ] I've picked the right materialization for the affected models. **To check because of performance issues** # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #19082
560 lines
No EOL
19 KiB
YAML
560 lines
No EOL
19 KiB
YAML
models:
|
|
- name: int_daily_currency_exchange_rates
|
|
description: >-
|
|
This model holds a lot of data on currency exchange rates. The time
|
|
granularity is daily. Each record holds a currency pair for a specific
|
|
day, source and version.
|
|
|
|
Actual rates are sourced from xe.com data. The `guessed` and `forecast`
|
|
versions are built by simply 'pushing' the first/last exchange rate on
|
|
record. Basically, wherever we dont' have data for a date, we pick the
|
|
closest actual data point that comes from xe.com. Bear in mind this means
|
|
that `forecast` version records will change on a daily basis as actual
|
|
data moves forwards, meaning you shouldn't assume your money amounts
|
|
converted in the future should always stay put.
|
|
|
|
Note that, given the dimensionality, getting a simple time series for a
|
|
currency pair will require a bit of filtering.
|
|
|
|
Reverse rates are explicit. This means that, for any given day and any
|
|
given currency pair, you will find two records with opposite from/to
|
|
positions. So, for 2024-01-01, you will find both a EUR->USD record and a
|
|
USD->EUR record with the opposite rate (1/rate).
|
|
columns:
|
|
- name: id_exchange_rate
|
|
data_type: text
|
|
description: A unique ID for the record, derived from concatenating the
|
|
currencies, date, source and version. Currency order is relevant
|
|
(EURUSD != USDEUR).
|
|
tests:
|
|
- not_null
|
|
- unique
|
|
- name: from_currency
|
|
data_type: character
|
|
description: The source currency, represented as an ISO 4217 code.
|
|
tests:
|
|
- not_null
|
|
- name: to_currency
|
|
data_type: character
|
|
description: The target currency, represented as an ISO 4217 code.
|
|
tests:
|
|
- not_null
|
|
- name: rate
|
|
data_type: numeric
|
|
description: >-
|
|
The exchange rate, represented as the units of the target currency
|
|
that one unit of source currency gets you. So, from_currency=USD to
|
|
to_currency=PLN with rate=4.2 should be read as '1 US Dollar buys me
|
|
4.2 Polish Zlotys'.
|
|
|
|
For same currency pairs (EUR to EUR, USD to USD, etc). The rate will
|
|
always be one.
|
|
|
|
The rate can be smaller than one, but can't be negative.
|
|
tests:
|
|
- not_negative_or_zero
|
|
- not_null
|
|
- name: rate_date_utc
|
|
data_type: date
|
|
description: The date in which the rate record is relevant.
|
|
tests:
|
|
- not_null
|
|
- name: source
|
|
data_type: text
|
|
description:
|
|
Where is the data coming from. Records that are composed from
|
|
making assumptions on real data will contain `_inferred`.
|
|
- name: rate_version
|
|
data_type: text
|
|
description:
|
|
The version of the rate. This can be one of `actual` (the rate is a
|
|
reality fact), `forecast` (the rate sits in the future and is a guess
|
|
in nature) or `guess` (the rate sits in the past and is a guess in
|
|
nature). Note that one currency pair can have multiple rate versions
|
|
on the same date.
|
|
tests:
|
|
- accepted_values:
|
|
values:
|
|
- guess
|
|
- actual
|
|
- forecast
|
|
- not_null
|
|
- name: updated_at_utc
|
|
data_type: timestamp with time zone
|
|
description:
|
|
For external sources, this will be the point in time when the
|
|
information was obtained from them. For stuff we make up here in the
|
|
DWH, this will be the point in time when we made the assumption.
|
|
tests:
|
|
- not_null
|
|
- name: int_simple_exchange_rates
|
|
description: >-
|
|
A simplified vision of exchange rates, derived from
|
|
`int_daily_currency_exchange_rates`. Come here if you don't want to
|
|
understand nuances and complexities and just want to convert rates.
|
|
|
|
The time granularity is daily. Each record holds a currency pair for a
|
|
specific day. You will only find one conversion rate per currency pair and
|
|
date.
|
|
tests:
|
|
- dbt_utils.unique_combination_of_columns:
|
|
combination_of_columns:
|
|
- from_currency
|
|
- to_currency
|
|
- rate_date_utc
|
|
columns:
|
|
- name: from_currency
|
|
data_type: character
|
|
description: The source currency, represented as an ISO 4217 code.
|
|
tests:
|
|
- not_null
|
|
- name: to_currency
|
|
data_type: character
|
|
description: The source currency, represented as an ISO 4217 code.
|
|
tests:
|
|
- not_null
|
|
- name: rate
|
|
data_type: numeric
|
|
description: The target currency, represented as an ISO 4217 code.
|
|
tests:
|
|
- not_null
|
|
- name: rate_date_utc
|
|
data_type: date
|
|
description: The date in which the rate record is relevant.
|
|
tests:
|
|
- not_null
|
|
- name: updated_at_utc
|
|
data_type: timestamp with time zone
|
|
description:
|
|
For external sources, this will be the point in time when the
|
|
information was obtained from them. For stuff we make up here in the
|
|
DWH, this will be the point in time when we made the assumption.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: int_mtd_vs_previous_year_metrics
|
|
description: |
|
|
This model is used for global KPIs.
|
|
|
|
It aggregates all the mtd models with the different metrics per source
|
|
and computes any necessary weighted metric across different sources.
|
|
Each metric has a date, dimension and dimension value that defines
|
|
the primary key of this model.
|
|
|
|
Finally, it displays any metric on the current date, the previous year
|
|
date and it computes the relative increment by using the macro:
|
|
- calculate_safe_relative_increment
|
|
|
|
tests:
|
|
- dbt_utils.unique_combination_of_columns:
|
|
combination_of_columns:
|
|
- date
|
|
- dimension
|
|
- dimension_value
|
|
|
|
columns:
|
|
- name: date
|
|
data_type: date
|
|
description: The date for the month-to-date metrics.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: dimension
|
|
data_type: string
|
|
description: The dimension or granularity of the metrics.
|
|
tests:
|
|
- accepted_values:
|
|
values:
|
|
- global
|
|
- by_number_of_listings
|
|
- by_billing_country
|
|
|
|
- name: dimension_value
|
|
data_type: string
|
|
description: The value or segment available for the selected dimension.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: int_dates_mtd
|
|
description: |
|
|
This model provides Month-To-Date (MTD) necessary dates for MTD-based models to work.
|
|
- For month-to-month complete information, it retrieves all end month dates that have elapsed since 2020.
|
|
- For month-to-date information, it retrieves the days of the current month of this year up to yesterday.
|
|
Additionally, it also gets the days of its equivalent month from last year previous the current day of month of today.
|
|
|
|
Example:
|
|
Imagine we have are at 4th June 2024.
|
|
- We will get the dates for 1st, 2nd, 3rd of June 2024.
|
|
- We will also get the dates for 1st, 2nd, 3rd of June 2023.
|
|
- We will get all end of months from 2020 to yesterday,
|
|
i.e., 31st January 2020, 29th February 2020, ..., 30th April 2024, 31st May 2024.
|
|
|
|
columns:
|
|
- name: year
|
|
data_type: int
|
|
description: Year number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: month
|
|
data_type: int
|
|
description: Month number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: day
|
|
data_type: int
|
|
description: Day monthly number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: is_end_of_month
|
|
data_type: boolean
|
|
description: Is end of month, 1 for yes, 0 for no.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: is_current_month
|
|
data_type: boolean
|
|
description: |
|
|
Checks if the date is within the current executed month,
|
|
1 for yes, 0 for no.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: first_day_month
|
|
data_type: date
|
|
description: |
|
|
First day of the month correspoding to the date field.
|
|
It comes from int_dates_mtd logic.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: date
|
|
data_type: date
|
|
description: |
|
|
Main date for the computation, that is used for filters.
|
|
It's the primary key for this model.
|
|
tests:
|
|
- not_null
|
|
- unique
|
|
|
|
- name: int_dates_by_deal
|
|
description: |
|
|
This model provides the necessary dates for each deal for deal-based KPIs models to work.
|
|
It only considers those dates starting from when the host user of the deal was first available.
|
|
|
|
tests:
|
|
- dbt_utils.unique_combination_of_columns:
|
|
combination_of_columns:
|
|
- date
|
|
- id_deal
|
|
|
|
columns:
|
|
- name: year
|
|
data_type: int
|
|
description: Year number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: month
|
|
data_type: int
|
|
description: Month number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: day
|
|
data_type: int
|
|
description: Day monthly number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: last_day_month
|
|
data_type: date
|
|
description: |
|
|
Last day of the month correspoding to the date field.
|
|
It comes from int_dates_mtd logic.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: first_day_month
|
|
data_type: date
|
|
description: |
|
|
First day of the month correspoding to the date field.
|
|
It comes from int_dates_mtd logic.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: date
|
|
data_type: date
|
|
description: |
|
|
Main date for the computation, that is used for filters.
|
|
It's the primary key for this model.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: id_deal
|
|
data_type: string
|
|
description: |
|
|
Main identifier of the B2B clients. A deal can have multiple hosts.
|
|
A host should usually have a deal, but it does not happen on all cases.
|
|
In this KPI reporting we force that Deal is not null to avoid potential
|
|
data quality issues.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: int_mtd_aggregated_metrics
|
|
description: |
|
|
The `int_mtd_aggregated_metrics` model aggregates multiple metrics on a year, month, and day basis.
|
|
The primary source of data is the `int_mtd_vs_previous_year_metrics` model, which contain the combination
|
|
of metrics data per source. This model just changes the display format to unpivot the information into
|
|
a set of metric, value, previous_year_value and relative_increment at a given date. It uses Jinja
|
|
code to avoid code replication.
|
|
|
|
tests:
|
|
- dbt_utils.unique_combination_of_columns:
|
|
combination_of_columns:
|
|
- date
|
|
- metric
|
|
- dimension
|
|
- dimension_value
|
|
|
|
columns:
|
|
- name: year
|
|
data_type: int
|
|
description: year number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: month
|
|
data_type: int
|
|
description: month number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: day
|
|
data_type: int
|
|
description: day monthly number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: is_end_of_month
|
|
data_type: boolean
|
|
description: is end of month, 1 for yes, 0 for no.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: is_current_month
|
|
data_type: boolean
|
|
description: |
|
|
checks if the date is within the current executed month,
|
|
1 for yes, 0 for no.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: first_day_month
|
|
data_type: date
|
|
description: |
|
|
first day of the month correspoding to the date field.
|
|
It comes from int_dates_mtd logic.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: date
|
|
data_type: date
|
|
description: |
|
|
main date for the computation, that is used for filters.
|
|
It comes from int_dates_mtd logic.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: dimension
|
|
data_type: string
|
|
description: The dimension or granularity of the metrics.
|
|
tests:
|
|
- accepted_values:
|
|
values:
|
|
- global
|
|
- by_number_of_listings
|
|
- by_billing_country
|
|
|
|
- name: dimension_value
|
|
data_type: string
|
|
description: The value or segment available for the selected dimension.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: previous_year_date
|
|
data_type: date
|
|
description: |
|
|
corresponds to the date of the previous year, with respect to the field date.
|
|
It comes from int_dates_mtd logic. It's only displayed for information purposes,
|
|
should not be needed for reporting.
|
|
|
|
- name: metric
|
|
data_type: text
|
|
description: name of the business metric.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: order_by
|
|
data_type: integer
|
|
description: |
|
|
order for displaying purposes. Null values are accepted, but keep
|
|
in mind that then there's no default controlled display order.
|
|
|
|
- name: number_format
|
|
data_type: text
|
|
description: allows for grouping and formatting for displaying purposes.
|
|
tests:
|
|
- accepted_values:
|
|
values: ['integer', 'percentage', 'currency_gbp']
|
|
|
|
- name: value
|
|
data_type: numeric
|
|
description: |
|
|
numeric value (integer or decimal) that corresponds to the MTD computation of the metric
|
|
at a given date.
|
|
|
|
- name: previous_year_value
|
|
data_type: numeric
|
|
description: |
|
|
numeric value (integer or decimal) that corresponds to the MTD computation of the metric
|
|
on the previous year at a given date.
|
|
|
|
- name: relative_increment
|
|
data_type: numeric
|
|
description: |
|
|
numeric value that corresponds to the relative increment between value and previous year value,
|
|
following the computation: value / previous_year_value - 1.
|
|
|
|
- name: relative_increment_with_sign_format
|
|
data_type: numeric
|
|
description: |
|
|
relative_increment value multiplied by -1 in case this metric's growth doesn't have a
|
|
positive impact for Superhog, otherwise is equal to relative_increment.
|
|
This value is specially created for formatting in PBI
|
|
|
|
|
|
- name: int_monthly_aggregated_metrics_history_by_deal
|
|
description: |
|
|
This model aggregates the monthly historic information regarding the different metrics computed
|
|
at deal level. The primary sources of data are the `int_yyy__monthly_XXXXX_history_by_deal`
|
|
models which contain the raw metrics data per source.
|
|
|
|
Unlike the int_mtd_aggregated_metrics, this model does not abstract each metric, since
|
|
no comparison versus last year is performed. In short, it just gathers the information stored
|
|
in the abovementioned models.
|
|
|
|
To keep in mind: aggregating the information of this model will not necessarily result into
|
|
the int_mtd_aggregated metrics because 1) the mtd version contains more computing dates
|
|
than the by deal version, the latest being a subset of the first, and 2) the deal based model
|
|
enforces that a booking/guest journey/listing/etc has a host with a deal assigned, which is
|
|
not necessarily the case.
|
|
|
|
tests:
|
|
- dbt_utils.unique_combination_of_columns:
|
|
combination_of_columns:
|
|
- date
|
|
- id_deal
|
|
|
|
columns:
|
|
- name: date
|
|
data_type: date
|
|
description: The last day of the month or yesterday for historic metrics.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: id_deal
|
|
data_type: character varying
|
|
description: Id of the deal associated to the host.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: int_dates_mtd_by_dimension
|
|
description: |
|
|
This model provides Month-To-Date (MTD) necessary dates, dimension and dimension_values
|
|
for MTD-based models to work.
|
|
It provides the basic "empty" structure from which metrics will be built upon. This is, on
|
|
top of the Date that characterises int_dates_mtd, including the dimensions and their
|
|
respective values that should appear in any mtd metric model.
|
|
|
|
Example:
|
|
- For the "global" dimension, we will only have the "global" dimension value.
|
|
- For the "by_number_of_listing" dimension, we will have different values
|
|
according to the segments defined, ex: 0, 1-5, 6-20, etc.
|
|
|
|
... and so on and forth for any available dimension. These combinations should appear
|
|
for each date of the MTD models.
|
|
|
|
tests:
|
|
- dbt_utils.unique_combination_of_columns:
|
|
combination_of_columns:
|
|
- date
|
|
- dimension
|
|
- dimension_value
|
|
|
|
columns:
|
|
- name: year
|
|
data_type: int
|
|
description: Year number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: month
|
|
data_type: int
|
|
description: Month number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: day
|
|
data_type: int
|
|
description: Day monthly number of the given date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: is_end_of_month
|
|
data_type: boolean
|
|
description: Is end of month, 1 for yes, 0 for no.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: is_current_month
|
|
data_type: boolean
|
|
description: |
|
|
Checks if the date is within the current executed month,
|
|
1 for yes, 0 for no.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: first_day_month
|
|
data_type: date
|
|
description: |
|
|
First day of the month correspoding to the date field.
|
|
It comes from int_dates_mtd logic.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: date
|
|
data_type: date
|
|
description: |
|
|
Main date for the computation, metrics include monthly information
|
|
until this date.
|
|
tests:
|
|
- not_null
|
|
|
|
- name: dimension
|
|
data_type: string
|
|
description: The dimension or granularity of the metrics.
|
|
tests:
|
|
- accepted_values:
|
|
values:
|
|
- global
|
|
- by_number_of_listings
|
|
- by_billing_country
|
|
|
|
- name: dimension_value
|
|
data_type: string
|
|
description: The value or segment available for the selected dimension.
|
|
tests:
|
|
- not_null |