Merged PR 2642: Booking Charge Events to have a similar logic as invoicing

# Description

Based on the Notion page [here](https://www.notion.so/knowyourguest-superhog/Data-quality-assessment-Billable-Bookings-97008b7f1cbb4beb98295a22528acd03), this PR mainly adds:

- Charge at verification depends on when the Guest joined or the VR was updated (depending on if the verification request associated exists does not exists or it does, respectively)
- Add the logic to retrieve the last plan that is available at the beginning of each month.
- Additional where conditions, relatively documented, to imitate was is available in the invoicing process. This includes removal of duplicated bookings, guest verification and guest user existing.

Additional changes:
- Remove select star :)
- Added dbt tests that didn't exist before
- Add informative fields on 1) how many price plans were active in a given month, even though we just keep the last one and 2) cases in which bookings are created after the booking is supposed to be charged.

Data quality:´
- I have mixed feelings. This does not correspond 100% to the volumes provided by the exporter, though are quite close. For April, May and June 2024, this logic has more than 95% of accuracy. Still, the fact of using the guest joined, and especially the updated date, I feel like this will make past data "disappear" if the guest has another journey. I don't know for sure since we do not store incremental updates of user information.
I'd propose to move forward to have an estimated metric available anyway - with this or a similar logic, even the previous one based on the used link at but fixing the cases in which there's no VR associated.
Let's discuss it!

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #18111
This commit is contained in:
Oriol Roqué Paniagua 2024-09-03 13:15:40 +00:00
parent 0cb03e0808
commit 4cfc0dcd45
3 changed files with 257 additions and 22 deletions

View file

@ -1,43 +1,72 @@
{{ config(materialized="table", unique_key="id_booking") }}
with
stg_core__booking as (select * from {{ ref("stg_core__booking") }}),
int_core__verification_requests as (select * from {{ ref("int_core__verification_requests") }}),
int_core__price_plans as (select * from {{ ref("int_core__price_plans") }}),
int_core__verification_requests as (
select * from {{ ref("int_core__verification_requests") }}
),
int_core__invoicing_price_plans_per_month as (select * from {{ ref("int_core__invoicing_price_plans_per_month") }}),
int_core__unified_user as (select * from {{ ref("int_core__unified_user") }}),
int_core__user_host as (select * from {{ ref("int_core__user_host") }}),
int_simple_exchange_rates as (select * from {{ ref("int_simple_exchange_rates") }}),
int_core__duplicate_bookings as (select * from {{ ref("int_core__duplicate_bookings") }}),
booking_with_relevant_price_plans as (
select
*,
b.id_booking,
pp.id_price_plan,
uh.account_currency_iso4217,
pp2.booking_fee_local,
pp.price_plan_created_at_utc,
b.created_at_utc as booking_created_at_utc,
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then vr.verification_estimated_started_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is not null
then vr.updated_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is null
then uu.joined_at_utc
end as booking_fee_charge_at_utc
from stg_core__booking b
left join
int_core__verification_requests vr
on b.id_verification_request = vr.id_verification_request
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
left join int_core__user_host uh on b.id_user_host = uh.id_user_host
left join int_core__duplicate_bookings dupb on b.id_booking = dupb.id_booking
left join int_core__verification_requests vr on b.id_verification_request = vr.id_verification_request
left join int_core__invoicing_price_plans_per_month pp
on b.id_user_host = pp.id_user_host
left join int_core__price_plans pp2 on pp.id_price_plan = pp2.id_price_plan
left join int_core__unified_user uu on b.id_user_guest = uu.id_user
where
-- The dates that defines which price plan applies to the booking depends
-- on charged by type. With the below case, we evaluate if a certain price
-- plan relates to the booking
-- on charged by type. Additionally, having a verification request linked
-- or not to the booking can affect the verification or guest-related date to consider.
-- With the below case, we evaluate if a certain price plan relates to the booking
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then b.check_in_date_utc between pp.active_in_month_start_date_utc and pp.active_in_month_end_date_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is not null
then
coalesce(
(
vr.verification_estimated_started_at_utc
between pp.start_at_utc and pp.end_at_utc
vr.updated_date_utc
between pp.active_in_month_start_date_utc and pp.active_in_month_end_date_utc
),
false
)
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is null
then
coalesce(
(
uu.joined_date_utc
between pp.active_in_month_start_date_utc and pp.active_in_month_end_date_utc
),
false
)
else false
end
= true
-- Guest verification status exists
and uu.id_user_verification_status is not null
-- Guest user exists
and uu.id_user is not null
-- Booking is not duplicated
and dupb.id_booking is null
),
untied_bookings as (
-- If a booking has two valid price plans, take the earliest
@ -45,15 +74,27 @@ with
from booking_with_relevant_price_plans brpp
group by id_booking
)
select
ub.id_booking,
ub.id_price_plan,
brpp.booking_fee_local,
brpp.account_currency_iso4217,
ser.rate * brpp.booking_fee_local as booking_fee_in_gbp,
brpp.booking_fee_charge_at_utc,
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc,
case
when date(booking_fee_charge_at_utc) < date(booking_created_at_utc) then true
else false
end as is_booking_created_after_charging_date
from untied_bookings ub
left join
booking_with_relevant_price_plans brpp
on ub.id_booking = brpp.id_booking
and ub.id_price_plan = brpp.id_price_plan
left join
int_simple_exchange_rates ser
on (
ser.from_currency = brpp.account_currency_iso4217
and ser.rate_date_utc = cast(brpp.booking_fee_charge_at_utc as date)
and ser.to_currency = 'GBP'
)

View file

@ -0,0 +1,48 @@
with
int_core__price_plans as (select * from {{ ref("int_core__price_plans") }}),
int_dates as (select * from {{ ref("int_dates") }}),
active_price_plans_per_month as (
select distinct
pp.id_price_plan,
pp.id_user_host,
pp.price_plan_charged_by_type,
pp.start_at_utc as price_plan_start_at_utc,
pp.end_at_utc as price_plan_end_at_utc,
pp.created_at_utc as price_plan_created_at_utc,
d.month_start_date as active_in_month_start_date_utc,
d.month_end_date as active_in_month_end_date_utc
from int_core__price_plans pp
inner join
int_dates d on d.date_day between pp.start_date_utc and pp.end_date_utc
),
sorted_price_plans as (
select
id_price_plan,
id_user_host,
price_plan_charged_by_type,
price_plan_start_at_utc,
price_plan_end_at_utc,
price_plan_created_at_utc,
active_in_month_start_date_utc,
active_in_month_end_date_utc,
count(*) over (
partition by id_user_host, active_in_month_end_date_utc
) as price_plans_active_in_month,
row_number() over (
partition by id_user_host, active_in_month_end_date_utc
order by price_plan_created_at_utc desc
) as rn
from active_price_plans_per_month
)
select
id_price_plan,
id_user_host,
price_plan_charged_by_type,
price_plan_start_at_utc,
price_plan_end_at_utc,
price_plan_created_at_utc,
active_in_month_start_date_utc,
active_in_month_end_date_utc,
price_plans_active_in_month
from sorted_price_plans
where rn = 1

View file

@ -62,8 +62,21 @@ models:
As for the charge dates: the exact point in time at which we consider
that we should be charging a fee depends on billing details of the host
customer. For some bookings, this will be the check-in. For others, its
when the guest begins the verification process.
customer. For some bookings, this will be the check-in. For others, it's
when the guest 'begins the verification process'. Here, depending on whether
the booking has or not a verification request linked to it, we will consider
either by when the guest joined or when the verification request was last
updated. Be aware that this 'begins the verification process' logic is different
from other DWH models; the current one explained here aiming to replicate what
is currently being done for invoicing.
Note: even though we aim to replicate what is happening for the invoicing process,
you need to be aware that the monthly volumes do not match exactly with what we
have in the invoicing exporter.
An additional column called is_booking_created_after_charging_date exemplifies the
fact that some bookings will get created after the verification process should have
happened - thus these are likely billable bookings that have not been billed.
Not all bookings appear here since we don't charge a fee for all
bookings.
@ -72,15 +85,28 @@ models:
- name: id_booking
data_type: bigint
description: The unique, Superhog generated id for this booking.
tests:
- not_null
- unique
- name: id_price_plan
data_type: bigint
description: The id of the price plan that relates to this booking.
tests:
- not_null
- name: booking_fee_local
data_type: numeric
description: The fee to apply to the booking, in host currency.
- name: account_currency_iso4217
data_type: character varying
description: Currency used by host/pm/platform users.
- name: booking_fee_in_gbp
data_type: numeric
description: The fee to apply to the booking, in GBP.
- name: booking_fee_charge_at_utc
data_type: timestamp without time zone
description: |
@ -88,6 +114,8 @@ models:
This could be the check-in date of the booking or the date in which the guest verification
started, depending on the billing settings of the host.
tests:
- not_null
- name: booking_fee_charge_date_utc
data_type: date
@ -96,6 +124,15 @@ models:
This could be the check-in date of the booking or the date in which the guest verification
started, depending on the billing settings of the host.
tests:
- not_null
- name: is_booking_created_after_charging_date
data_type: boolean
description: |
Flag to identify if the booking was created after the
expected charge date. It can identify cases that might not
be covered within the current invoicing process.
- name: int_core__check_in_cover_prices
description: |
@ -2782,3 +2819,112 @@ models:
data_type: integer
description: |
Integer-based flag version of total_bookings_with_product_bundle_with_paid_service.
- name: int_core__invoicing_price_plans_per_month
description: |
This model contains the price plans that were considered as active
for the invoicing process each month. This is, given that more than
one plan coexist within the same month, we take the price plan that
was active at the end of the month. This price plan is the one that
should apply for the invoicing of that month, indisctintly of the
fact that there was other plans active before.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- id_user_host
- id_price_plan
- active_in_month_start_date_utc
columns:
- name: id_price_plan
data_type: bigint
description: |
The unique identifier of this table, representing
the identifier of the price plan.
tests:
- not_null
columns:
- name: id_user_host
data_type: string
description: |
The unique identifier of the user host that has
a price plan.
tests:
- not_null
columns:
- name: price_plan_charged_by_type
data_type: string
description: |
Type of price plan that determines that will affect
the billing logic applied.
tests:
- not_null
columns:
- name: price_plan_start_at_utc
data_type: timestamp
description: |
Original timestamp of when a given price plan
started to be active.
tests:
- not_null
columns:
- name: price_plan_end_at_utc
data_type: timestamp
description: |
Original timestamp of when a given price plan
ended to be active. If it's currently active,
a default end date on 2099 will apply.
tests:
- not_null
columns:
- name: price_plan_created_at_utc
data_type: timestamp
description: |
Original timestamp of when a given price plan
was created.
tests:
- not_null
columns:
- name: active_in_month_start_date_utc
data_type: date
description: |
Date that refers to the first day of the
month on which we will consider a price plan
as active.
If we're interested in retrieving the information from
June, this date will be the 1st of June.
tests:
- not_null
columns:
- name: active_in_month_end_date_utc
data_type: date
description: |
Date that refers to the last day of the
month on which we will consider a price plan
as active.
If we're interested in retrieving the information from
June, this date will be the 30th of June.
tests:
- not_null
columns:
- name: price_plans_active_in_month
data_type: integer
description: |
Integer that refers to how many price plans
have been active in a given month for a user
host. The price plan retrieved will be the
last one that was active, so this is just an
informative field of how many changes of price
plans happened during that month.
tests:
- not_null