data-dwh-dbt-project/models/intermediate/core/schema.yaml

435 lines
16 KiB
YAML
Raw Normal View History

2024-04-08 09:44:32 +02:00
version: 2
models:
- name: int_core__duplicate_bookings
description: |
A list of bookings which are considered duplicates of other bookings.
We currently consider two bookings to be duplicate if they have the same:
- Guest user id
- Accomodation id
- Check-in date
Bear in mind these bookings do have different booking ids.
Out of a duplicated tuple of 2 or more bookings:
- Our logic will consider the oldest one to be the "original", not duplicate one.
- This table will contain only the duplicates, and not the original.
columns:
- name: id_booking
data_type: bigint
description: The unique, Superhog generated id for this booking.
- name: is_duplicate_booking
data_type: boolean
description: |
True if the booking is duplicate.
If you are thinking that this is redundant, you are right. All
records in this table will be true. But we keep this field to
make your life easier when joining with other tables.
- name: is_duplicating_booking_with_id
data_type: bigint
description: |
Indicates what's the original booking being duplicated.
If there is a tuple of duplicate bookings {A, B, C}, where A is the
original and the others are the duplicates:
- B and C will appear in this table, A will not.
- The value of this field for both B and C will be A's id.
- name: int_core__booking_charge_events
description: |
Booking charge events is a fancy word for saying: a booking happened,
the related host had a booking fee set up at the right time, hence we
need to charge him.
The table contains one record per booking and shows the associated
booking fee, as well as the point in time in which the charge event was
considered.
Be wary of the booking fees: they don't have an associated currency.
Crazy, I know, but we currently don't store that information in the
backend.
As for the charge dates: the exact point in time at which we consider
that we should be charging a fee depends on billing details of the host
customer. For some bookings, this will be the check-in. For others, its
when the guest begins the verification process.
Not all bookings appear here since we don't charge a fee for all
bookings.
columns:
- name: id_booking
data_type: bigint
description: The unique, Superhog generated id for this booking.
- name: id_price_plan
data_type: bigint
description: The id of the price plan that relates to this booking.
- name: booking_fee_local
data_type: numeric
description: The fee to apply to the booking, in host currency.
- name: booking_fee_charge_at_utc
data_type: timestamp without time zone
description: |
The point in time in which the booking should be invoiced.
This could be the check-in date of the booking or the date in which the guest verification
started, depending on the billing settings of the host.
- name: booking_fee_charge_date_utc
data_type: date
description: |
The date in which the booking should be invoiced.
This could be the check-in date of the booking or the date in which the guest verification
started, depending on the billing settings of the host.
2024-05-07 17:52:35 +02:00
- name: int_core__check_in_cover_prices
description: |
This table shows the active price and cover for the Check-In Hero
product.
The prices are obtained through a gross `GROUP BY` thrown at the payment
validation sets table. It works this way because the price settings of
this product were done with a terrible backend data model design.
How could the prices be changed remains a mystery, and the current design
does not support any kind of history tracking. When the time comes to
adjust prices, we will have a lot of careful work to do to make sure that
we keep history and that no downstream dependencies of this model blow
up.
columns:
- name: local_currency_iso_4217
data_type: character varying
description: A currency code.
- name: checkin_cover_guest_fee_local_curr
data_type: numeric
description: |
The fee that the guest user must pay if he wants to purchase the
cover.
- name: checkin_cover_cover_amount_local_curr
data_type: numeric
description: |
The amount for which the guest user is covered if he faces problems
during check-in.
2024-06-10 16:23:45 +02:00
- name: int_core__unified_user
columns:
- name: id_user
data_type: character varying
description: The unique ID for the user.
tests:
- not_null
- unique
- name: int_core__vr_check_in_cover
columns:
- name: id_verification_request
data_type: character varying
description: The unique ID for the verification request.
tests:
- not_null
- unique
- name: int_core__mtd_booking_metrics
columns:
- name: date
data_type: date
description: The date for the month-to-date booking-related metrics.
tests:
- not_null
- unique
- name: int_core__mtd_aggregated_metrics
description: |
The `int_core__mtd_aggregated_metrics` model aggregates multiple metrics on a year, month, and day basis.
The primary sources of data are the `int_core__mtd_XXXXX_metrics` models, which contain the raw metrics data per source.
This model uses Jinja templating to dynamically generate SQL code, combining various metrics into a single table.
This approach reduces repetition and enhances maintainability.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- date
- metric
columns:
- name: year
data_type: int
description: year number of the given date.
tests:
- not_null
- name: month
data_type: int
description: month number of the given date.
tests:
- not_null
- name: day
data_type: int
description: day monthly number of the given date.
tests:
- not_null
- name: is_end_of_month
data_type: boolean
description: is end of month, 1 for yes, 0 for no.
tests:
- not_null
- name: is_current_month
data_type: boolean
description: |
checks if the date is within the current executed month,
1 for yes, 0 for no.
tests:
- not_null
- name: date
data_type: date
description: |
main date for the computation, that is used for filters.
It comes from int_dates_mtd logic.
tests:
- not_null
- name: previous_year_date
data_type: date
description: |
corresponds to the date of the previous year, with respect to the field date.
It comes from int_dates_mtd logic. It's only displayed for information purposes,
should not be needed for reporting.
- name: metric
data_type: text
description: name of the business metric.
tests:
- not_null
- name: order_by
data_type: integer
description: |
order for displaying purposes. Null values are accepted, but keep
in mind that then there's no default controlled display order.
- name: number_format
data_type: text
description: allows for grouping and formatting for displaying purposes.
tests:
- accepted_values:
values: ['integer', 'percentage']
- name: value
data_type: numeric
description: |
numeric value (integer or decimal) that corresponds to the MTD computation of the metric
at a given date.
- name: previous_year_value
data_type: numeric
description: |
numeric value (integer or decimal) that corresponds to the MTD computation of the metric
on the previous year at a given date.
- name: relative_increment
data_type: numeric
description: |
numeric value that corresponds to the relative increment between value and previous year value,
following the computation: value / previous_year_value - 1.
- name: int_core__verification_request_completeness
description: |
The `int_core__verification_request_completeness` model allows to determine if a verification request is
completed or not. To achieve it, it encapsulates the logic to determine the different possibilites. Its main
output is the column is_verification_request_complete, but it also provides outputs of the intermediate logic
steps to be used for further modeling, such as determining the completion date.
columns:
- name: id_verification_request
data_type: bigint
description: id of the verification request. It's the unique key for this model.
tests:
- not_null
- unique
- name: expected_verification_count
data_type: int
description: count of verifications that are expected to be passed in order to complete the request.
- name: confirmed_from_same_verification_request_count
data_type: int
description: count of confirmed verifications that its logic is computed from the same verification request.
- name: confirmed_from_previous_verification_requests_count
data_type: int
description: count of confirmed verifications that its logic is computed from previous verification requests.
- name: confirmed_verification_count
data_type: int
description: |
total count of confirmed verifications. Mainly, it's the sum of the confirmed verifications
that come from the same verification request plus the ones that come from previous verifications requests.
- name: is_verification_request_complete
data_type: boolean
description: if the verification request can be considered as completed or not.
- name: used_verification_from_same_verification_request
data_type: boolean
description: |
if the verification request can be considered as completed and has at least one confirmed verification
from the same verification request.
- name: used_verification_from_previous_verification_requests
data_type: boolean
description: |
if the verification request can be considered as completed and has at least one confirmed verification
from a previous verification request.
- name: is_complete_only_from_previous_verification_requests
data_type: boolean
description: |
if the verification request can be considered as completed and all confirmed verifications are from
previous verification requests.
- name: int_core__verification_request_completed_date
description: |
The `int_core__verification_request_completed_date` model allows to retrieve the time in which the guest
journey, or verification request, was completed. It only considers that a guest journey is completed based
on the positive outcome of the is_verification_complete boolean coming from verification_request_completeness
model.
The completion time is computed as follows:
- Only considering verification requests that have been tagged as completed. From here, we have:
- If the verification request has, at least, one verification linked; the date will be the creation date
of the last verification created linked to that verification request.
To keep in mind: for some cases, the last verification can have updates after the creation, but these
generally happen with very low time differences with respect to the creation date. However, there are
some outliers - mostly linked to admin override - that we're not considering here, since these might
not necessarily be linked to the Guest completing the Guest Journey.
- If the verification request does not have any verification linked; we assume an automatic completion.
In this case, we use the time from which the verification request was created.
For some cases, it is possible that this logic still generates some completed times that are actually
before a user usage of the link. For these cases, we do an override and we apply the used_link_at_utc
as the completed time. To account for this cases, check the boolean column
is_completed_at_overriden_with_used_link_at.
In summary, the guest journey completion time provided here is an estimation.
Finally, this model only contains those request that have been completed, so keep it in mind when joining this
table.
columns:
- name: id_verification_request
data_type: bigint
description: id of the completed verification request. It's the unique key for this model.
tests:
- not_null
- unique
- name: estimated_completed_at_utc
data_type: timestamp
description: estimated timestamp of when the verification request was completed.
- name: estimated_completed_date_utc
data_type: date
description: estimated date from the timestamp of when the verification request was completed.
- name: is_completed_at_overriden_with_used_link_at
2024-06-18 13:10:47 +02:00
data_type: boolean
description: >
boolean indicating if the estimated dates have been overriden with the
used link since
the initial computation was still considering an end date before a
starting date.
- name: int_core__verification_payments
columns:
- name: id_verification_to_payment
data_type: bigint
description: Unique ID for the rel between the payment verification and the
payment at hand.
tests:
- unique
- not_null
- name: id_payment
data_type: bigint
description: Unique ID for the payment itself.
tests:
- unique
- not_null
- name: is_refundable
data_type: boolean
- name: created_at_utc
data_type: timestamp without time zone
- name: updated_at_utc
data_type: timestamp without time zone
- name: payment_due_at_utc
data_type: timestamp without time zone
tests:
- not_null
- name: payment_due_date_utc
data_type: date
tests:
- not_null
- name: payment_paid_at_utc
data_type: timestamp without time zone
- name: payment_paid_date_utc
data_type: date
- name: payment_reference
data_type: character varying
- name: refund_due_at_utc
data_type: timestamp without time zone
- name: refund_due_date_utc
data_type: date
- name: payment_refunded_at_utc
data_type: timestamp without time zone
- name: payment_refunded_date_utc
data_type: date
- name: refund_payment_reference
data_type: character varying
- name: id_guest_user
data_type: character varying
- name: id_verification
data_type: bigint
- name: id_verification_request
data_type: bigint
- name: verification_payment_type
data_type: character varying
- name: amount_in_txn_currency
data_type: numeric
tests:
- not_null
- name: currency
data_type: character varying
tests:
- not_null
- name: amount_in_gbp
data_type: numeric
tests:
- not_null
- name: payment_status
data_type: character varying
- name: notes
data_type: character varying
description: >-
A simplified table that holds guest journey payments with details around
when they happen, what service was being paid, what was the related
verification request, etc.
Currency rates are converted to GBP with our simple exchange rates view.