From 4cfc0dcd450a36213894084bece717a589a6125b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Oriol=20Roqu=C3=A9=20Paniagua?= Date: Tue, 3 Sep 2024 13:15:40 +0000 Subject: [PATCH] Merged PR 2642: Booking Charge Events to have a similar logic as invoicing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # Description Based on the Notion page [here](https://www.notion.so/knowyourguest-superhog/Data-quality-assessment-Billable-Bookings-97008b7f1cbb4beb98295a22528acd03), this PR mainly adds: - Charge at verification depends on when the Guest joined or the VR was updated (depending on if the verification request associated exists does not exists or it does, respectively) - Add the logic to retrieve the last plan that is available at the beginning of each month. - Additional where conditions, relatively documented, to imitate was is available in the invoicing process. This includes removal of duplicated bookings, guest verification and guest user existing. Additional changes: - Remove select star :) - Added dbt tests that didn't exist before - Add informative fields on 1) how many price plans were active in a given month, even though we just keep the last one and 2) cases in which bookings are created after the booking is supposed to be charged. Data quality:ยด - I have mixed feelings. This does not correspond 100% to the volumes provided by the exporter, though are quite close. For April, May and June 2024, this logic has more than 95% of accuracy. Still, the fact of using the guest joined, and especially the updated date, I feel like this will make past data "disappear" if the guest has another journey. I don't know for sure since we do not store incremental updates of user information. I'd propose to move forward to have an estimated metric available anyway - with this or a similar logic, even the previous one based on the used link at but fixing the cases in which there's no VR associated. Let's discuss it! # Checklist - [X] The edited models and dependants run properly with production data. - [X] The edited models are sufficiently documented. - [X] The edited models contain PK tests, and I've ran and passed them. - [X] I have checked for DRY opportunities with other models and docs. - [X] I've picked the right materialization for the affected models. # Other - [ ] Check if a full-refresh is required after this PR is merged. Related work items: #18111 --- .../core/int_core__booking_charge_events.sql | 77 +++++++-- ..._core__invoicing_price_plans_per_month.sql | 48 ++++++ models/intermediate/core/schema.yaml | 154 +++++++++++++++++- 3 files changed, 257 insertions(+), 22 deletions(-) create mode 100644 models/intermediate/core/int_core__invoicing_price_plans_per_month.sql diff --git a/models/intermediate/core/int_core__booking_charge_events.sql b/models/intermediate/core/int_core__booking_charge_events.sql index 8e635bf..016214d 100644 --- a/models/intermediate/core/int_core__booking_charge_events.sql +++ b/models/intermediate/core/int_core__booking_charge_events.sql @@ -1,43 +1,72 @@ {{ config(materialized="table", unique_key="id_booking") }} with stg_core__booking as (select * from {{ ref("stg_core__booking") }}), + int_core__verification_requests as (select * from {{ ref("int_core__verification_requests") }}), int_core__price_plans as (select * from {{ ref("int_core__price_plans") }}), - int_core__verification_requests as ( - select * from {{ ref("int_core__verification_requests") }} - ), + int_core__invoicing_price_plans_per_month as (select * from {{ ref("int_core__invoicing_price_plans_per_month") }}), + int_core__unified_user as (select * from {{ ref("int_core__unified_user") }}), + int_core__user_host as (select * from {{ ref("int_core__user_host") }}), + int_simple_exchange_rates as (select * from {{ ref("int_simple_exchange_rates") }}), + int_core__duplicate_bookings as (select * from {{ ref("int_core__duplicate_bookings") }}), booking_with_relevant_price_plans as ( select - *, + b.id_booking, + pp.id_price_plan, + uh.account_currency_iso4217, + pp2.booking_fee_local, + pp.price_plan_created_at_utc, + b.created_at_utc as booking_created_at_utc, case when pp.price_plan_charged_by_type = 'CheckInDate' then b.check_in_at_utc - when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') - then vr.verification_estimated_started_at_utc + when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is not null + then vr.updated_at_utc + when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is null + then uu.joined_at_utc end as booking_fee_charge_at_utc from stg_core__booking b - left join - int_core__verification_requests vr - on b.id_verification_request = vr.id_verification_request - left join int_core__price_plans pp on b.id_user_host = pp.id_user_host + left join int_core__user_host uh on b.id_user_host = uh.id_user_host + left join int_core__duplicate_bookings dupb on b.id_booking = dupb.id_booking + left join int_core__verification_requests vr on b.id_verification_request = vr.id_verification_request + left join int_core__invoicing_price_plans_per_month pp + on b.id_user_host = pp.id_user_host + left join int_core__price_plans pp2 on pp.id_price_plan = pp2.id_price_plan + left join int_core__unified_user uu on b.id_user_guest = uu.id_user where -- The dates that defines which price plan applies to the booking depends - -- on charged by type. With the below case, we evaluate if a certain price - -- plan relates to the booking + -- on charged by type. Additionally, having a verification request linked + -- or not to the booking can affect the verification or guest-related date to consider. + -- With the below case, we evaluate if a certain price plan relates to the booking case when pp.price_plan_charged_by_type = 'CheckInDate' - then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc - when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') + then b.check_in_date_utc between pp.active_in_month_start_date_utc and pp.active_in_month_end_date_utc + when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is not null then coalesce( ( - vr.verification_estimated_started_at_utc - between pp.start_at_utc and pp.end_at_utc + vr.updated_date_utc + between pp.active_in_month_start_date_utc and pp.active_in_month_end_date_utc + ), + false + ) + when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All') and b.id_verification_request is null + then + coalesce( + ( + uu.joined_date_utc + between pp.active_in_month_start_date_utc and pp.active_in_month_end_date_utc ), false ) else false end = true + -- Guest verification status exists + and uu.id_user_verification_status is not null + -- Guest user exists + and uu.id_user is not null + -- Booking is not duplicated + and dupb.id_booking is null ), untied_bookings as ( -- If a booking has two valid price plans, take the earliest @@ -45,15 +74,27 @@ with from booking_with_relevant_price_plans brpp group by id_booking ) - select ub.id_booking, ub.id_price_plan, brpp.booking_fee_local, + brpp.account_currency_iso4217, + ser.rate * brpp.booking_fee_local as booking_fee_in_gbp, brpp.booking_fee_charge_at_utc, - cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc + cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc, + case + when date(booking_fee_charge_at_utc) < date(booking_created_at_utc) then true + else false + end as is_booking_created_after_charging_date from untied_bookings ub left join booking_with_relevant_price_plans brpp on ub.id_booking = brpp.id_booking and ub.id_price_plan = brpp.id_price_plan +left join + int_simple_exchange_rates ser + on ( + ser.from_currency = brpp.account_currency_iso4217 + and ser.rate_date_utc = cast(brpp.booking_fee_charge_at_utc as date) + and ser.to_currency = 'GBP' + ) \ No newline at end of file diff --git a/models/intermediate/core/int_core__invoicing_price_plans_per_month.sql b/models/intermediate/core/int_core__invoicing_price_plans_per_month.sql new file mode 100644 index 0000000..6c9bdfc --- /dev/null +++ b/models/intermediate/core/int_core__invoicing_price_plans_per_month.sql @@ -0,0 +1,48 @@ +with + int_core__price_plans as (select * from {{ ref("int_core__price_plans") }}), + int_dates as (select * from {{ ref("int_dates") }}), + active_price_plans_per_month as ( + select distinct + pp.id_price_plan, + pp.id_user_host, + pp.price_plan_charged_by_type, + pp.start_at_utc as price_plan_start_at_utc, + pp.end_at_utc as price_plan_end_at_utc, + pp.created_at_utc as price_plan_created_at_utc, + d.month_start_date as active_in_month_start_date_utc, + d.month_end_date as active_in_month_end_date_utc + from int_core__price_plans pp + inner join + int_dates d on d.date_day between pp.start_date_utc and pp.end_date_utc + ), + sorted_price_plans as ( + select + id_price_plan, + id_user_host, + price_plan_charged_by_type, + price_plan_start_at_utc, + price_plan_end_at_utc, + price_plan_created_at_utc, + active_in_month_start_date_utc, + active_in_month_end_date_utc, + count(*) over ( + partition by id_user_host, active_in_month_end_date_utc + ) as price_plans_active_in_month, + row_number() over ( + partition by id_user_host, active_in_month_end_date_utc + order by price_plan_created_at_utc desc + ) as rn + from active_price_plans_per_month + ) +select + id_price_plan, + id_user_host, + price_plan_charged_by_type, + price_plan_start_at_utc, + price_plan_end_at_utc, + price_plan_created_at_utc, + active_in_month_start_date_utc, + active_in_month_end_date_utc, + price_plans_active_in_month +from sorted_price_plans +where rn = 1 diff --git a/models/intermediate/core/schema.yaml b/models/intermediate/core/schema.yaml index f90c0e8..bcb8f6d 100644 --- a/models/intermediate/core/schema.yaml +++ b/models/intermediate/core/schema.yaml @@ -62,8 +62,21 @@ models: As for the charge dates: the exact point in time at which we consider that we should be charging a fee depends on billing details of the host - customer. For some bookings, this will be the check-in. For others, its - when the guest begins the verification process. + customer. For some bookings, this will be the check-in. For others, it's + when the guest 'begins the verification process'. Here, depending on whether + the booking has or not a verification request linked to it, we will consider + either by when the guest joined or when the verification request was last + updated. Be aware that this 'begins the verification process' logic is different + from other DWH models; the current one explained here aiming to replicate what + is currently being done for invoicing. + + Note: even though we aim to replicate what is happening for the invoicing process, + you need to be aware that the monthly volumes do not match exactly with what we + have in the invoicing exporter. + + An additional column called is_booking_created_after_charging_date exemplifies the + fact that some bookings will get created after the verification process should have + happened - thus these are likely billable bookings that have not been billed. Not all bookings appear here since we don't charge a fee for all bookings. @@ -72,15 +85,28 @@ models: - name: id_booking data_type: bigint description: The unique, Superhog generated id for this booking. + tests: + - not_null + - unique - name: id_price_plan data_type: bigint description: The id of the price plan that relates to this booking. + tests: + - not_null - name: booking_fee_local data_type: numeric description: The fee to apply to the booking, in host currency. + - name: account_currency_iso4217 + data_type: character varying + description: Currency used by host/pm/platform users. + + - name: booking_fee_in_gbp + data_type: numeric + description: The fee to apply to the booking, in GBP. + - name: booking_fee_charge_at_utc data_type: timestamp without time zone description: | @@ -88,6 +114,8 @@ models: This could be the check-in date of the booking or the date in which the guest verification started, depending on the billing settings of the host. + tests: + - not_null - name: booking_fee_charge_date_utc data_type: date @@ -96,7 +124,16 @@ models: This could be the check-in date of the booking or the date in which the guest verification started, depending on the billing settings of the host. - + tests: + - not_null + + - name: is_booking_created_after_charging_date + data_type: boolean + description: | + Flag to identify if the booking was created after the + expected charge date. It can identify cases that might not + be covered within the current invoicing process. + - name: int_core__check_in_cover_prices description: | @@ -2781,4 +2818,113 @@ models: - name: has_bookings_with_product_bundle_with_paid_service data_type: integer description: | - Integer-based flag version of total_bookings_with_product_bundle_with_paid_service. \ No newline at end of file + Integer-based flag version of total_bookings_with_product_bundle_with_paid_service. + + - name: int_core__invoicing_price_plans_per_month + description: | + This model contains the price plans that were considered as active + for the invoicing process each month. This is, given that more than + one plan coexist within the same month, we take the price plan that + was active at the end of the month. This price plan is the one that + should apply for the invoicing of that month, indisctintly of the + fact that there was other plans active before. + + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - id_user_host + - id_price_plan + - active_in_month_start_date_utc + + columns: + - name: id_price_plan + data_type: bigint + description: | + The unique identifier of this table, representing + the identifier of the price plan. + tests: + - not_null + + columns: + - name: id_user_host + data_type: string + description: | + The unique identifier of the user host that has + a price plan. + tests: + - not_null + + columns: + - name: price_plan_charged_by_type + data_type: string + description: | + Type of price plan that determines that will affect + the billing logic applied. + tests: + - not_null + + columns: + - name: price_plan_start_at_utc + data_type: timestamp + description: | + Original timestamp of when a given price plan + started to be active. + tests: + - not_null + + columns: + - name: price_plan_end_at_utc + data_type: timestamp + description: | + Original timestamp of when a given price plan + ended to be active. If it's currently active, + a default end date on 2099 will apply. + tests: + - not_null + + columns: + - name: price_plan_created_at_utc + data_type: timestamp + description: | + Original timestamp of when a given price plan + was created. + tests: + - not_null + + columns: + - name: active_in_month_start_date_utc + data_type: date + description: | + Date that refers to the first day of the + month on which we will consider a price plan + as active. + If we're interested in retrieving the information from + June, this date will be the 1st of June. + tests: + - not_null + + columns: + - name: active_in_month_end_date_utc + data_type: date + description: | + Date that refers to the last day of the + month on which we will consider a price plan + as active. + If we're interested in retrieving the information from + June, this date will be the 30th of June. + tests: + - not_null + + columns: + - name: price_plans_active_in_month + data_type: integer + description: | + Integer that refers to how many price plans + have been active in a given month for a user + host. The price plan retrieved will be the + last one that was active, so this is just an + informative field of how many changes of price + plans happened during that month. + tests: + - not_null + \ No newline at end of file