data-dwh-dbt-project/models/reporting/general/schema.yaml
Oriol Roqué Paniagua 85131985d8 Merged PR 2615: Beautification of KPIs dimensions
# Description

Changes:

* Separate 1) the internal naming of dimensions available within DWH vs. 2) the display of the dimensions in the reporting. Mainly it changes the "by_number_of_listings" to display "By # of Listings Booked in 12 Months". I edited the production macro since to me it's linked to when things are available for display.
* Add preceding zeros on the segmentation so it's ordered correctly. Before, the segment 21-60 was displayed before the 6-20.
* Also added some capital letters to the schema config of the reporting model :)

I attach a screenshot of how it looks in PBI in my local development branch to exemplify why this is "Beautification". Be aware that merging this also puts in production the dimensions.

![image.png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2615/attachments/image.png)

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #19325
2024-08-21 14:42:05 +00:00

487 lines
No EOL
17 KiB
YAML

version: 2
models:
- name: dates
description: |
A dates dimension. Each record represents one calendar day.
All othe columns have handy representations of the date, its subcomponents, and other relative dates.
This table is generated with the dbt date package: https://hub.getdbt.com/calogica/dbt_date/latest/.
columns:
- name: date_day
data_type: date
description: The date this record represents. All relative dates are relative to this. All derived date components are derived from this.
- name: prior_date_day
data_type: date
description: The day before date day.
- name: next_date_day
data_type: date
description: The day after date day.
- name: prior_year_date_day
data_type: date
description: The same day of the same month, but in the previous year. If date day is Feb 29th, this col returns Feb 28th.
- name: prior_year_over_year_date_day
data_type: date
description: The day placed 365 days before the date day. Behaves a bit funny with leap years.
- name: day_of_week
data_type: integer
description: The day of the week as a number, were Monday is 1 and Sunday is 7.
- name: day_of_week_name
data_type: text
description: The full name of the day of the week.
- name: day_of_week_name_short
data_type: text
description: The day of the week as a 3 digit shortened version.
- name: day_of_month
data_type: integer
description: The day of the month as a number.
- name: day_of_year
data_type: integer
description: The day of the year as a number, where January 1st is 1 and December 31st is 365/366.
- name: week_start_date
data_type: date
description: |
The full date for the first day of the week of date day.
It considers Sunday to be the first day of the week.
- name: week_end_date
data_type: date
description: |
The full date for the last day of the week of date day.
It considers Saturday to be the last day of the week.
- name: prior_year_week_start_date
data_type: date
description: Same as week_start_date, but for the same date day in the previous year.
- name: prior_year_week_end_date
data_type: date
description: Same as week_end_date, but for the same date day in the previous year.
- name: week_of_year
data_type: integer
description: The week of the year as a number, where the first week is 1 and the last week is 52/53.
- name: iso_week_start_date
data_type: date
description: |
The full date for the first day of the week of date day, according to ISO specs.
It considers Monday to be the first day of the week.
Read more here: https://en.wikipedia.org/wiki/ISO_week_date
- name: iso_week_end_date
data_type: date
description: |
The full date for the last day of the week of date day, according to ISO specs.
It considers Sunday to be the last day of the week.
Read more here: https://en.wikipedia.org/wiki/ISO_week_date
- name: prior_year_iso_week_start_date
data_type: date
description: "Read more here: https://en.wikipedia.org/wiki/ISO_week_date"
- name: prior_year_iso_week_end_date
data_type: date
description: "Read more here: https://en.wikipedia.org/wiki/ISO_week_date"
- name: iso_week_of_year
data_type: integer
description: "Read more here: https://en.wikipedia.org/wiki/ISO_week_date"
- name: prior_year_week_of_year
data_type: integer
description: ""
- name: prior_year_iso_week_of_year
data_type: integer
description: "Read more here: https://en.wikipedia.org/wiki/ISO_week_date"
- name: month_of_year
data_type: integer
description: The month date day belongs to as a number (1 for Jan, 12 for Dec).
- name: month_name
data_type: text
description: The month date day belongs to in English.
- name: month_name_short
data_type: text
description: The month date day belongs to as a 3 digit shortened version.
- name: month_start_date
data_type: date
description: The full date for the first day of the month.
- name: month_end_date
data_type: date
description: The full date for the last day of the month.
- name: prior_year_month_start_date
data_type: date
description: The full date for the first day of the same month last year.
- name: prior_year_month_end_date
data_type: date
description: The full date for the last day of the same month last year.
- name: quarter_of_year
data_type: integer
description: The quarter date day belongs to as a number (1 for Q1, 4 for Q4).
- name: quarter_start_date
data_type: date
description: The full date for the first date of the quarter.
- name: quarter_end_date
data_type: date
description: The full date for the last date of the quarter.
- name: year_number
data_type: integer
description: The year date day belongs to as a number.
- name: year_start_date
data_type: date
description: The full date for the first day of the year.
- name: year_end_date
data_type: date
description: The full date for the last day of the year.
- name: daily_currency_exchange_rates
description:
This model holds a lot of data on currency exchange rates. The time
granularity is daily. Each record holds a currency pair for a specific
day, source and version.
Actual rates are sourced from xe.com data. The `guessed` and `forecast`
versions are built by simply 'pushing' the first/last exchange rate on
record. Basically, wherever we don't have data for a date, we pick the
closest actual data point that comes from xe.com. Bear in mind this means
that `forecast` version records will change on a daily basis as actual
data moves forwards, meaning you shouldn't assume your money amounts
converted in the future should always stay put.
Note that, given the dimensionality, getting a simple time series for a
currency pair will require a bit of filtering.
Reverse rates are explicit. This means that, for any given day and any
given currency pair, you will find two records with opposite from/to
positions. So, for 2024-01-01, you will find both a EUR->USD record and a
USD->EUR record with the opposite rate (1/rate).
columns:
- name: id_exchange_rate
data_type: text
description: A unique ID for the record, derived from concatenating the
currencies, date, source and version. Currency order is relevant
(EURUSD != USDEUR).
tests:
- not_null
- unique
- name: from_currency
data_type: character
description: The source currency, represented as an ISO 4217 code.
tests:
- not_null
- name: to_currency
data_type: character
description: The target currency, represented as an ISO 4217 code.
tests:
- not_null
- name: rate
data_type: numeric
description: >-
The exchange rate, represented as the units of the target currency
that one unit of source currency gets you. So, from_currency=USD to
to_currency=PLN with rate=4.2 should be read as '1 US Dollar buys me
4.2 Polish Zlotys'.
For same currency pairs (EUR to EUR, USD to USD, etc). The rate will
always be one.
The rate can be smaller than one, but can't be negative.
tests:
- not_negative_or_zero
- not_null
- name: rate_date_utc
data_type: date
description: The date in which the rate record is relevant.
tests:
- not_null
- name: source
data_type: text
description:
Where is the data coming from. Records that are composed from
making assumptions on real data will contain `_inferred`.
- name: rate_version
data_type: text
description:
The version of the rate. This can be one of `actual` (the rate is a
reality fact), `forecast` (the rate sits in the future and is a guess
in nature) or `guess` (the rate sits in the past and is a guess in
nature). Note that one currency pair can have multiple rate versions
on the same date.
tests:
- accepted_values:
values:
- guess
- actual
- forecast
- not_null
- name: updated_at_utc
data_type: timestamp with time zone
description:
For external sources, this will be the point in time when the
information was obtained from them. For stuff we make up here in the
DWH, this will be the point in time when we made the assumption.
tests:
- not_null
- name: simple_exchange_rates
description: >-
A simplified vision of exchange rates, derived from
`int_daily_currency_exchange_rates`. Come here if you don't want to
understand nuances and complexities and just want to convert rates.
The time granularity is daily. Each record holds a currency pair for a
specific day. You will only find one conversion rate per currency pair and
date.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- from_currency
- to_currency
- rate_date_utc
columns:
- name: from_currency
data_type: character
description: The source currency, represented as an ISO 4217 code.
tests:
- not_null
- name: to_currency
data_type: character
description: The source currency, represented as an ISO 4217 code.
tests:
- not_null
- name: rate
data_type: numeric
description: The target currency, represented as an ISO 4217 code.
tests:
- not_null
- name: rate_date_utc
data_type: date
description: The date in which the rate record is relevant.
tests:
- not_null
- name: updated_at_utc
data_type: timestamp with time zone
description:
For external sources, this will be the point in time when the
information was obtained from them. For stuff we make up here in the
DWH, this will be the point in time when we made the assumption.
tests:
- not_null
- name: mtd_aggregated_metrics
description: |
This model aggregates the historic information of our business by providing
different metrics computed at global and dimension level.
It's the main source of information for the Main KPIs reporting, specifically
on the MTD (Month To Date) and the Monthly Overview.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- metric
- dimension
- dimension_value
columns:
- name: year
data_type: int
description: Year number of the given date.
tests:
- not_null
- name: month
data_type: int
description: Month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: Day monthly number of the given date.
tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: Is end of month, 1 for yes, 0 for no.
tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
Checks if the date is within the current executed month,
1 for yes, 0 for no.
tests:
- not_null
- name: first_day_month
data_type: date
description: |
First day of the month correspoding to the date field.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: date
data_type: date
description: |
Main date for the computation, that is used for filters.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: dimension
data_type: string
description: |
The dimension or granularity of the metrics. Keep in mind that
in this reporting model this field corresponds to the
dimension_display; this is, the name of the dimension for
displaying purposes.
tests:
- not_null
- name: dimension_value
data_type: string
description: The value or segment available for the selected dimension.
tests:
- not_null
- name: previous_year_date
data_type: date
description: |
Corresponds to the date of the previous year, with respect to the field date.
It comes from int_dates_mtd logic. It's only displayed for information purposes,
should not be needed for reporting.
- name: metric
data_type: text
description: Name of the business metric.
tests:
- not_null
- name: order_by
data_type: integer
description: |
Order for displaying purposes. Null values are accepted, but keep
in mind that then there's no default controlled display order.
- name: number_format
data_type: text
description: Allows for grouping and formatting for displaying purposes.
tests:
- accepted_values:
values: ['integer', 'percentage', 'currency_gbp']
- name: value
data_type: numeric
description: |
Numeric value (integer or decimal) that corresponds to the MTD computation of the metric
at a given date. Note that if the month is not in progress, then this value corresponds
to the monthly figure.
- name: previous_year_value
data_type: numeric
description: |
Numeric value (integer or decimal) that corresponds to the MTD computation of the metric
on the previous year at a given date.
- name: relative_increment
data_type: numeric
description: |
Numeric value that corresponds to the relative increment between value and previous year value,
following the computation: value / previous_year_value - 1.
- name: relative_increment_with_sign_format
data_type: numeric
description: |
Relative_increment value multiplied by -1 in case this metric's growth doesn't have a
positive impact for Superhog, otherwise is equal to relative_increment.
This value is specially created for formatting in PBI
- name: monthly_aggregated_metrics_history_by_deal
description: |
This model aggregates the monthly historic information regarding the different metrics computed
at deal level. The primary source of data is the `int_monthly_XXXXX_history_by_deal`
model which contain the raw metrics data per source.
This table is used to provide "By Deal" metrics in the Business Overview reporting.
Unlike the mtd_aggregated_metrics, this model does not abstract each metric, since
no comparison versus last year is performed. In short, it just gathers the information stored
in the abovementioned models.
To keep in mind: aggregating the information of this model will not necessarily result into
the int_mtd_aggregated metrics because 1) the mtd version contains more computing dates
than the by deal version, the latest being a subset of the first, and 2) the deal based model
enforces that a booking/guest journey/listing/etc has a host with a deal assigned, which is
not necessarily the case.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- id_deal
columns:
- name: date
data_type: date
description: The last day of the month or yesterday for historic metrics.
tests:
- not_null
- name: id_deal
data_type: character varying
description: Id of the deal associated to the host.
tests:
- not_null
- name: year
data_type: int
description: year number of the given date.
tests:
- not_null
- name: month
data_type: int
description: month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: day monthly number of the given date.
tests:
- not_null