576 lines
26 KiB
Markdown
576 lines
26 KiB
Markdown
|
|
# Data quality assessment: Billable Bookings
|
|||
|
|
|
|||
|
|
**2024-07-17, by Uri**
|
|||
|
|
|
|||
|
|
This page aims to document the differences in Billable Bookings observed from Finance point of view vs. DWH point of view. This originates from the fact that, wanting to expose this KPI through DWH prove to be inconsistent with the billable bookings coming from Finance side.
|
|||
|
|
|
|||
|
|
# Finance side
|
|||
|
|
|
|||
|
|
Finance retrieves the Billable Bookings from the monthly accounting reports, that are generated usually on the first working day of each month by the project [data-invoicing-exporter](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter). Specifically, they retrieve it from the **Account Reports Summary**, summing the tab of **BillableBookingCount**. Jamie provided the file they were using for June 2024 so I (Uri) could dig more into this. There was no amendment made this month.
|
|||
|
|
|
|||
|
|
The number of billable bookings according to Finance in June 2024 is: **25,538**
|
|||
|
|
|
|||
|
|
# DWH side
|
|||
|
|
|
|||
|
|
The billable bookings should come from the table **int_core__booking_charge_events**. Mainly, there’s 2 ways of charge a booking: either when the **verification** starts, or when the **check-in** starts.
|
|||
|
|
|
|||
|
|
The verification start date that is used for the logic has changed recently (mid June 24) because of the new estimated Guest Journey Start Date, that is when the link is used. Here you can see the changes:
|
|||
|
|
|
|||
|
|
- [Commit 4f672800 - Change on int_core__verification_requests](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/4f6728003a30f7188f10c48cc60eb2191d604907?refName=refs/heads/master&path=/models/intermediate/core/int_core__verification_requests.sql&_a=compare)
|
|||
|
|
- [Commit 4f672800 - Change on int_core__booking_charge_events](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/4f6728003a30f7188f10c48cc60eb2191d604907?refName=refs%2Fheads%2Fmaster&path=%2Fmodels%2Fintermediate%2Fcore%2Fint_core__booking_charge_events.sql&_a=compare)
|
|||
|
|
|
|||
|
|
Therefore, before this change, the booking charge events was considered a verification start that corresponded to **joined_at_utc** coming from the **int_core__unified_user** table.
|
|||
|
|
|
|||
|
|
In a nutshell, running this on 17th July 2024 for the month of June, the values retrieved are:
|
|||
|
|
|
|||
|
|
- Using verification estimated start date (link used at): **23,440**
|
|||
|
|
- Using verification guest joined date: **21,604**
|
|||
|
|
|
|||
|
|
### Query available here
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
with
|
|||
|
|
booking_charge_events_joined_at as (
|
|||
|
|
with
|
|||
|
|
stg_core__booking as (select * from staging.stg_core__booking),
|
|||
|
|
int_core__price_plans as (select * from intermediate.int_core__price_plans),
|
|||
|
|
int_core__verification_requests as (
|
|||
|
|
select * from intermediate.int_core__verification_requests
|
|||
|
|
),
|
|||
|
|
booking_with_relevant_price_plans as (
|
|||
|
|
select
|
|||
|
|
*,
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then icuu.joined_at_utc
|
|||
|
|
end as booking_fee_charge_at_utc
|
|||
|
|
from stg_core__booking b
|
|||
|
|
left join
|
|||
|
|
int_core__verification_requests vr
|
|||
|
|
on b.id_verification_request = vr.id_verification_request
|
|||
|
|
left join intermediate.int_core__unified_user icuu
|
|||
|
|
on vr.id_user_guest = icuu.id_user
|
|||
|
|
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
|
|||
|
|
where
|
|||
|
|
-- The dates that defines which price plan applies to the booking depends
|
|||
|
|
-- on charged by type. With the below case, we evaluate if a certain price
|
|||
|
|
-- plan relates to the booking
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then
|
|||
|
|
coalesce(
|
|||
|
|
(
|
|||
|
|
icuu.joined_at_utc
|
|||
|
|
between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
),
|
|||
|
|
false
|
|||
|
|
)
|
|||
|
|
else false
|
|||
|
|
end
|
|||
|
|
= true
|
|||
|
|
),
|
|||
|
|
untied_bookings as (
|
|||
|
|
-- If a booking has two valid price plans, take the earliest
|
|||
|
|
select id_booking, min(id_price_plan) as id_price_plan
|
|||
|
|
from booking_with_relevant_price_plans brpp
|
|||
|
|
group by id_booking
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
select
|
|||
|
|
ub.id_booking,
|
|||
|
|
ub.id_price_plan,
|
|||
|
|
brpp.booking_fee_local,
|
|||
|
|
brpp.booking_fee_charge_at_utc,
|
|||
|
|
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
|
|||
|
|
from untied_bookings ub
|
|||
|
|
left join
|
|||
|
|
booking_with_relevant_price_plans brpp
|
|||
|
|
on ub.id_booking = brpp.id_booking
|
|||
|
|
and ub.id_price_plan = brpp.id_price_plan
|
|||
|
|
|
|||
|
|
),
|
|||
|
|
booking_charge_events_estimated_at as (
|
|||
|
|
with
|
|||
|
|
stg_core__booking as (select * from staging.stg_core__booking),
|
|||
|
|
int_core__unified_user as (select * from intermediate.int_core__unified_user),
|
|||
|
|
int_core__price_plans as (select * from intermediate.int_core__price_plans),
|
|||
|
|
int_core__verification_requests as (
|
|||
|
|
select * from intermediate.int_core__verification_requests
|
|||
|
|
),
|
|||
|
|
booking_with_relevant_price_plans as (
|
|||
|
|
select
|
|||
|
|
*,
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then vr.verification_estimated_started_at_utc
|
|||
|
|
end as booking_fee_charge_at_utc
|
|||
|
|
from stg_core__booking b
|
|||
|
|
left join
|
|||
|
|
int_core__verification_requests vr
|
|||
|
|
on b.id_verification_request = vr.id_verification_request
|
|||
|
|
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
|
|||
|
|
where
|
|||
|
|
-- The dates that defines which price plan applies to the booking depends
|
|||
|
|
-- on charged by type. With the below case, we evaluate if a certain price
|
|||
|
|
-- plan relates to the booking
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then
|
|||
|
|
coalesce(
|
|||
|
|
(
|
|||
|
|
vr.verification_estimated_started_at_utc
|
|||
|
|
between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
),
|
|||
|
|
false
|
|||
|
|
)
|
|||
|
|
else false
|
|||
|
|
end
|
|||
|
|
= true
|
|||
|
|
),
|
|||
|
|
untied_bookings as (
|
|||
|
|
-- If a booking has two valid price plans, take the earliest
|
|||
|
|
select id_booking, min(id_price_plan) as id_price_plan
|
|||
|
|
from booking_with_relevant_price_plans brpp
|
|||
|
|
group by id_booking
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
select
|
|||
|
|
ub.id_booking,
|
|||
|
|
ub.id_price_plan,
|
|||
|
|
brpp.booking_fee_local,
|
|||
|
|
brpp.booking_fee_charge_at_utc,
|
|||
|
|
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
|
|||
|
|
from untied_bookings ub
|
|||
|
|
left join
|
|||
|
|
booking_with_relevant_price_plans brpp
|
|||
|
|
on ub.id_booking = brpp.id_booking
|
|||
|
|
and ub.id_price_plan = brpp.id_price_plan
|
|||
|
|
|
|||
|
|
)
|
|||
|
|
select
|
|||
|
|
'verification_estimated_at' as type,
|
|||
|
|
COUNT(distinct id_booking) as billable_bookings
|
|||
|
|
from booking_charge_events_estimated_at
|
|||
|
|
where date_trunc('month', booking_fee_charge_date_utc) = '2024-06-01'
|
|||
|
|
union all
|
|||
|
|
select
|
|||
|
|
'verification_guest_joined_at' as type,
|
|||
|
|
COUNT(distinct id_booking) as billable_bookings
|
|||
|
|
from booking_charge_events_joined_at
|
|||
|
|
where date_trunc('month', booking_fee_charge_date_utc) = '2024-06-01'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
# First volume check
|
|||
|
|
|
|||
|
|
Well, given that there’s clearly a difference between Finance numbers and what we can obtain from DWH (in either case…), before jumping into some python-based project that runs queries in the Core schema directly… maybe let’s do some simple checks.
|
|||
|
|
|
|||
|
|
The Account Reports Summary excel is split per Company name, and it should be easy to retrieve the Deal Id linked to the company name (at least manually) for the most notable cases. So I just ordered the file in descending order of billable bookings and retrieved the first one:
|
|||
|
|
|
|||
|
|
- Company Name: LavandaBilling Account
|
|||
|
|
- Billable Bookings: 3,178
|
|||
|
|
|
|||
|
|
This has a couple of id_deals linked to it: '1604445496','20529225110’
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
### Query available here
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
select id_user as id_user_host, id_deal, company_name
|
|||
|
|
from intermediate.int_core__unified_user icuu
|
|||
|
|
where id_deal IN ('1604445496','20529225110')
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The good news is that we have the KPIs view by Deal, so we can retrieve the information for these 2 Deals super fast. It will be considering the estimated start date, as it’s the current behaviour for DWH.
|
|||
|
|
|
|||
|
|
The bad news is that… well… there’s no billable bookings at all!
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
### Query available here
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
select
|
|||
|
|
*
|
|||
|
|
from intermediate.int_monthly_aggregated_metrics_history_by_deal
|
|||
|
|
where date = '2024-06-30'
|
|||
|
|
and id_deal in ('20529225110','1604445496')
|
|||
|
|
order by date desc
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
So clearly there’s a big difference hiding somewhere.
|
|||
|
|
|
|||
|
|
Do we have a similar behaviour if using the **joined_at_utc** for verification start in **booking_charge_events**?
|
|||
|
|
|
|||
|
|
… well, yes. Exactly 0 billable bookings!
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
### Query available here
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
with
|
|||
|
|
booking_charge_events_joined_at as (
|
|||
|
|
with
|
|||
|
|
stg_core__booking as (select * from staging.stg_core__booking),
|
|||
|
|
int_core__price_plans as (select * from intermediate.int_core__price_plans),
|
|||
|
|
int_core__verification_requests as (
|
|||
|
|
select * from intermediate.int_core__verification_requests
|
|||
|
|
),
|
|||
|
|
booking_with_relevant_price_plans as (
|
|||
|
|
select
|
|||
|
|
*,
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then icuu.joined_at_utc
|
|||
|
|
end as booking_fee_charge_at_utc
|
|||
|
|
from stg_core__booking b
|
|||
|
|
left join
|
|||
|
|
int_core__verification_requests vr
|
|||
|
|
on b.id_verification_request = vr.id_verification_request
|
|||
|
|
left join intermediate.int_core__unified_user icuu
|
|||
|
|
on vr.id_user_guest = icuu.id_user
|
|||
|
|
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
|
|||
|
|
where
|
|||
|
|
-- The dates that defines which price plan applies to the booking depends
|
|||
|
|
-- on charged by type. With the below case, we evaluate if a certain price
|
|||
|
|
-- plan relates to the booking
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then
|
|||
|
|
coalesce(
|
|||
|
|
(
|
|||
|
|
icuu.joined_at_utc
|
|||
|
|
between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
),
|
|||
|
|
false
|
|||
|
|
)
|
|||
|
|
else false
|
|||
|
|
end
|
|||
|
|
= true
|
|||
|
|
),
|
|||
|
|
untied_bookings as (
|
|||
|
|
-- If a booking has two valid price plans, take the earliest
|
|||
|
|
select id_booking, min(id_price_plan) as id_price_plan
|
|||
|
|
from booking_with_relevant_price_plans brpp
|
|||
|
|
group by id_booking
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
select
|
|||
|
|
ub.id_booking,
|
|||
|
|
ub.id_price_plan,
|
|||
|
|
brpp.booking_fee_local,
|
|||
|
|
brpp.booking_fee_charge_at_utc,
|
|||
|
|
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
|
|||
|
|
from untied_bookings ub
|
|||
|
|
left join
|
|||
|
|
booking_with_relevant_price_plans brpp
|
|||
|
|
on ub.id_booking = brpp.id_booking
|
|||
|
|
and ub.id_price_plan = brpp.id_price_plan
|
|||
|
|
|
|||
|
|
),
|
|||
|
|
booking_charge_events_estimated_at as (
|
|||
|
|
with
|
|||
|
|
stg_core__booking as (select * from staging.stg_core__booking),
|
|||
|
|
int_core__unified_user as (select * from intermediate.int_core__unified_user),
|
|||
|
|
int_core__price_plans as (select * from intermediate.int_core__price_plans),
|
|||
|
|
int_core__verification_requests as (
|
|||
|
|
select * from intermediate.int_core__verification_requests
|
|||
|
|
),
|
|||
|
|
booking_with_relevant_price_plans as (
|
|||
|
|
select
|
|||
|
|
*,
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then vr.verification_estimated_started_at_utc
|
|||
|
|
end as booking_fee_charge_at_utc
|
|||
|
|
from stg_core__booking b
|
|||
|
|
left join
|
|||
|
|
int_core__verification_requests vr
|
|||
|
|
on b.id_verification_request = vr.id_verification_request
|
|||
|
|
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
|
|||
|
|
where
|
|||
|
|
-- The dates that defines which price plan applies to the booking depends
|
|||
|
|
-- on charged by type. With the below case, we evaluate if a certain price
|
|||
|
|
-- plan relates to the booking
|
|||
|
|
case
|
|||
|
|
when pp.price_plan_charged_by_type = 'CheckInDate'
|
|||
|
|
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
|
|||
|
|
then
|
|||
|
|
coalesce(
|
|||
|
|
(
|
|||
|
|
vr.verification_estimated_started_at_utc
|
|||
|
|
between pp.start_at_utc and pp.end_at_utc
|
|||
|
|
),
|
|||
|
|
false
|
|||
|
|
)
|
|||
|
|
else false
|
|||
|
|
end
|
|||
|
|
= true
|
|||
|
|
),
|
|||
|
|
untied_bookings as (
|
|||
|
|
-- If a booking has two valid price plans, take the earliest
|
|||
|
|
select id_booking, min(id_price_plan) as id_price_plan
|
|||
|
|
from booking_with_relevant_price_plans brpp
|
|||
|
|
group by id_booking
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
select
|
|||
|
|
ub.id_booking,
|
|||
|
|
ub.id_price_plan,
|
|||
|
|
brpp.booking_fee_local,
|
|||
|
|
brpp.booking_fee_charge_at_utc,
|
|||
|
|
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
|
|||
|
|
from untied_bookings ub
|
|||
|
|
left join
|
|||
|
|
booking_with_relevant_price_plans brpp
|
|||
|
|
on ub.id_booking = brpp.id_booking
|
|||
|
|
and ub.id_price_plan = brpp.id_price_plan
|
|||
|
|
|
|||
|
|
)
|
|||
|
|
select
|
|||
|
|
'verification_estimated_at' as type,
|
|||
|
|
COUNT(distinct bce.id_booking) as billable_bookings
|
|||
|
|
from booking_charge_events_estimated_at bce
|
|||
|
|
inner join intermediate.int_core__bookings b
|
|||
|
|
on bce.id_booking = b.id_booking
|
|||
|
|
inner join intermediate.int_core__unified_user u
|
|||
|
|
on b.id_user_host = u.id_user
|
|||
|
|
and u.id_deal in ('1604445496','20529225110')
|
|||
|
|
where date_trunc('month', bce.booking_fee_charge_date_utc) = '2024-06-01'
|
|||
|
|
union all
|
|||
|
|
select
|
|||
|
|
'verification_guest_joined_at' as type,
|
|||
|
|
COUNT(distinct bce.id_booking) as billable_bookings
|
|||
|
|
from booking_charge_events_joined_at bce
|
|||
|
|
inner join intermediate.int_core__bookings b
|
|||
|
|
on bce.id_booking = b.id_booking
|
|||
|
|
inner join intermediate.int_core__unified_user u
|
|||
|
|
on b.id_user_host = u.id_user
|
|||
|
|
and u.id_deal in ('1604445496','20529225110')
|
|||
|
|
where date_trunc('month', bce.booking_fee_charge_date_utc) = '2024-06-01'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
So clearly we have a situation here. Thankfully, Clay took the time to explain to me that this company is a special case in which these bookings do not have Guest Journeys, so looks suspicious to me in the sense that probably this is affecting somehow the logic in DWH. This is because it’s an “autohost account”, that might probably come from Partner API.
|
|||
|
|
|
|||
|
|
I also tried with the 2nd Company with more Billable Bookings in June 2024, namely:
|
|||
|
|
|
|||
|
|
- Company Name: Home Team Vacation Rentals LLC
|
|||
|
|
- Billable Bookings: 1,350
|
|||
|
|
- Deal Id: ‘15463726437’
|
|||
|
|
|
|||
|
|
Reusing the previous query and changing the deal id, we get these results:
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
So it’s looking MUCH better, even though it’s true that we miss/exceed the 1,350 for a few bookings. Let’s be pragmatic and focus first on the big issues, and then on the small ones…
|
|||
|
|
|
|||
|
|
# Debugging data-invoicing-exporter
|
|||
|
|
|
|||
|
|
I’ve already put some comments on the DevOps ticket, but might be worth to aim to understand where these Lavanda Billable Bookings come from.
|
|||
|
|
|
|||
|
|
Billable Booking Count is computed in `dashboard_tools.py` [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/dashboard_tools.py&version=GBmain&line=122&lineEnd=123&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents), which depends on `user_report_contents` [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/dashboard_tools.py&version=GBmain&line=81&lineEnd=82&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents). In turn, this is retrieved from the function `get_user_data`, defined [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/dashboard_tools.py&version=GBmain&line=186&lineEnd=187&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
|
|||
|
|
|
|||
|
|
From here, we can retrieve the query being used on bookings `queries.get_query_booking_data_of_user` [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/dashboard_tools.py&version=GBmain&line=200&lineEnd=201&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
|
|||
|
|
|
|||
|
|
Time to switch to queries.py, where the query we’re interested in is [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=126&lineEnd=127&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents). This query has some transactional sql.
|
|||
|
|
|
|||
|
|
Firstly, [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=140&lineEnd=141&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents), it creates a snapshot of the latest available `PricePlanToUser` at the moment of the extraction, **to be applied to all the bookings of the exporting time period considered**. These are the ones that are going to be used for the export. This is, technically, different from what we do in DWH in which **we apply the price plan according on what was active at the moment of verification start or check-in**. In case of more than 1 active price plan for a booking, we take the first one. This could explain changes on the volumes, but probably not the massive difference. We can check this later on.
|
|||
|
|
|
|||
|
|
Secondly, [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=166&lineEnd=167&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents), it somehow creates a unique booking table based on Guest, Accommodation and CheckIn date. This is very similar on how we handle it in DWH in `int_core__duplicate_booking` model. To keep in mind though that we’re not forcing this duplicate booking in DWH for booking charge events.
|
|||
|
|
|
|||
|
|
Thirdly, [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=174&lineEnd=175&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents), it computes a variable called `ListingStartDateByField` (which looks like a mistake in the name) that mainly retrieves the `PricePlanChargedByTypeId`. This comes from the snapshot made on point 1.
|
|||
|
|
|
|||
|
|
The last part is mainly a split on whether the price plan charge by type id is =3 or not. This for human beings means a different logic between A) if the billing needs to be considered in the CheckInDate and B) if it needs to be considered in the VerificationStartDate or All.
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
This is quite similar as we do in DWH in Booking Charge Events, that we just filter by the name instead of the ID.
|
|||
|
|
|
|||
|
|
These queries are massive monsters with around ~10 tables joined. However, these are all left joins (except for the unique bookings mentioned in the second point), so either the booking does not exist in Bookings or the difference lies in the WHERE clauses. For the sake of me not getting a stroke, I hope it’s the second, so let’s go.
|
|||
|
|
|
|||
|
|
Billable at check in [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=180&lineEnd=181&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)
|
|||
|
|
|
|||
|
|
- UserVerificationStatusId IS NOT NULL, from SuperhogUser table, understanding this as the guest user because of how it’s joined
|
|||
|
|
- GuestUserId IS NOT NULL, from User table
|
|||
|
|
- CreatedByUserId equals the user id we’re retrieving
|
|||
|
|
- CheckIn date is between Extraction Start and Extraction End, in a nutshell
|
|||
|
|
|
|||
|
|
Else (meaning billable at verification start ) [here](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=237&lineEnd=238&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)
|
|||
|
|
|
|||
|
|
- UserVerificationStatusId IS NOT NULL, from SuperhogUser table, understanding this as the guest user because of how it’s joined
|
|||
|
|
- GuestUserId IS NOT NULL, from User table
|
|||
|
|
- CreatedByUserId equals the user id we’re retrieving
|
|||
|
|
- and at least one of these conditions needs to be true
|
|||
|
|
- If the VerificationRequestId from Booking table IS NOT NULL then the UpdatedDate from the VerificationRequest needs to be between Extraction Start and Extraction End
|
|||
|
|
- If the VerificationRequestId from Booking table IS NULL then the JoinDate from the User table (Guest) needs to be between Extraction Start and Extraction End
|
|||
|
|
|
|||
|
|
If checking this versus what DWH is computing… well, it seems we do not have (or ever had) the UpdatedDate condition for the verification start. It would make sense that the main problem for the Lavanda subject comes from this point, after all, this condition only applies if VerificationRequestId is not filled, meaning it’s a booking that does not have a verification process because we ‘trust’ autohost verifications, and that would be in line with what Clay explained.
|
|||
|
|
|
|||
|
|
Let’s run a quick query to replicate the verification request being null/not null in DWH for this case. I’ll also retrieve the latest price plan for the users assigned to the Id Deals.
|
|||
|
|
|
|||
|
|
### Query here!
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
with pp as (
|
|||
|
|
select pp.*
|
|||
|
|
from intermediate.int_core__price_plans pp
|
|||
|
|
inner join intermediate.int_core__unified_user uu
|
|||
|
|
on pp.id_user_host = uu.id_user
|
|||
|
|
and uu.id_deal in ('20529225110', '1604445496')
|
|||
|
|
where pp.end_date_utc >= '2024-07-01' and pp.start_date_utc < '2024-07-01'
|
|||
|
|
)
|
|||
|
|
select
|
|||
|
|
host.id_deal,
|
|||
|
|
b.id_user_host,
|
|||
|
|
count(distinct b.id_booking) as cd_booking,
|
|||
|
|
count(1) as count
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__bookings b
|
|||
|
|
inner join intermediate.int_core__unified_user host
|
|||
|
|
on
|
|||
|
|
b.id_user_host = host.id_user
|
|||
|
|
inner join intermediate.int_core__unified_user guest
|
|||
|
|
on
|
|||
|
|
b.id_user_guest = guest.id_user
|
|||
|
|
left join intermediate.int_core__verification_requests vr
|
|||
|
|
on
|
|||
|
|
b.id_verification_request = vr.id_verification_request
|
|||
|
|
left join pp on pp.id_user_host = b.id_user_host
|
|||
|
|
where
|
|||
|
|
guest.id_user_verification_status is not null
|
|||
|
|
and guest.id_user is not null
|
|||
|
|
and host.id_deal in ('20529225110', '1604445496')
|
|||
|
|
and pp.id_price_plan <> 3
|
|||
|
|
and
|
|||
|
|
(
|
|||
|
|
b.id_verification_request is null
|
|||
|
|
and date_trunc('month',
|
|||
|
|
guest.joined_date_utc) = '2024-06-01'
|
|||
|
|
or
|
|||
|
|
b.id_verification_request is not null
|
|||
|
|
and date_trunc('month',
|
|||
|
|
vr.updated_date_utc) = '2024-06-01'
|
|||
|
|
)
|
|||
|
|
group by 1 ,2
|
|||
|
|
order by cd_booking desc
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The result of the query is the following:
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
and as you can see it’s quite close the Lavanda billable bookings from Finance:
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
At this stage I wonder if there’s 1 day difference in the period considered between the DWH query vs. Finance export; or if the data is exactly extracted over the last month.
|
|||
|
|
|
|||
|
|
By removing the filter on the deal id and including the 2 types of billing, meaning at verification start and at check in, we get:
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
Which is quite close to the 25,538 billable bookings from Finance. The fact that cd_booking and count display different numbers might be worth to double check to ensure there’s no duplicates.
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
### Query here!
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
with pp as (
|
|||
|
|
select pp.*
|
|||
|
|
from intermediate.int_core__price_plans pp
|
|||
|
|
inner join intermediate.int_core__unified_user uu
|
|||
|
|
on pp.id_user_host = uu.id_user
|
|||
|
|
where pp.end_date_utc >= '2024-07-01' and pp.start_date_utc < '2024-07-01'
|
|||
|
|
),
|
|||
|
|
verification_start as (
|
|||
|
|
select
|
|||
|
|
host.id_deal,
|
|||
|
|
b.id_user_host,
|
|||
|
|
'verification_start' as type,
|
|||
|
|
count(distinct b.id_booking) as cd_booking,
|
|||
|
|
count(1) as count
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__bookings b
|
|||
|
|
inner join intermediate.int_core__unified_user host
|
|||
|
|
on
|
|||
|
|
b.id_user_host = host.id_user
|
|||
|
|
inner join intermediate.int_core__unified_user guest
|
|||
|
|
on
|
|||
|
|
b.id_user_guest = guest.id_user
|
|||
|
|
left join intermediate.int_core__verification_requests vr
|
|||
|
|
on
|
|||
|
|
b.id_verification_request = vr.id_verification_request
|
|||
|
|
left join pp on pp.id_user_host = b.id_user_host
|
|||
|
|
where
|
|||
|
|
guest.id_user_verification_status is not null
|
|||
|
|
and guest.id_user is not null
|
|||
|
|
and pp.id_price_plan <> 3
|
|||
|
|
and
|
|||
|
|
(
|
|||
|
|
b.id_verification_request is null
|
|||
|
|
and date_trunc('month',
|
|||
|
|
guest.joined_date_utc) = '2024-06-01'
|
|||
|
|
or
|
|||
|
|
b.id_verification_request is not null
|
|||
|
|
and date_trunc('month',
|
|||
|
|
vr.updated_date_utc) = '2024-06-01'
|
|||
|
|
)
|
|||
|
|
group by 1,2,3
|
|||
|
|
),
|
|||
|
|
check_in as (
|
|||
|
|
select
|
|||
|
|
host.id_deal,
|
|||
|
|
b.id_user_host,
|
|||
|
|
'check-in' as type,
|
|||
|
|
count(distinct b.id_booking) as cd_booking,
|
|||
|
|
count(1) as count
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__bookings b
|
|||
|
|
inner join intermediate.int_core__unified_user host
|
|||
|
|
on
|
|||
|
|
b.id_user_host = host.id_user
|
|||
|
|
inner join intermediate.int_core__unified_user guest
|
|||
|
|
on
|
|||
|
|
b.id_user_guest = guest.id_user
|
|||
|
|
left join intermediate.int_core__verification_requests vr
|
|||
|
|
on
|
|||
|
|
b.id_verification_request = vr.id_verification_request
|
|||
|
|
left join pp on pp.id_user_host = b.id_user_host
|
|||
|
|
where
|
|||
|
|
guest.id_user_verification_status is not null
|
|||
|
|
and guest.id_user is not null
|
|||
|
|
and pp.id_price_plan = 3
|
|||
|
|
and b.id_verification_request is not null
|
|||
|
|
and date_trunc('month',
|
|||
|
|
b.check_in_at_utc) = '2024-06-01'
|
|||
|
|
group by 1,2,3
|
|||
|
|
),
|
|||
|
|
totals as (
|
|||
|
|
select * from check_in
|
|||
|
|
union all
|
|||
|
|
select * from verification_start
|
|||
|
|
)
|
|||
|
|
select
|
|||
|
|
type,
|
|||
|
|
sum(cd_booking) as cd_booking,
|
|||
|
|
sum(count) as count
|
|||
|
|
from totals
|
|||
|
|
group by 1
|
|||
|
|
```
|