26 KiB
Data quality assessment: Billable Bookings
2024-07-17, by Uri
This page aims to document the differences in Billable Bookings observed from Finance point of view vs. DWH point of view. This originates from the fact that, wanting to expose this KPI through DWH prove to be inconsistent with the billable bookings coming from Finance side.
Finance side
Finance retrieves the Billable Bookings from the monthly accounting reports, that are generated usually on the first working day of each month by the project data-invoicing-exporter. Specifically, they retrieve it from the Account Reports Summary, summing the tab of BillableBookingCount. Jamie provided the file they were using for June 2024 so I (Uri) could dig more into this. There was no amendment made this month.
The number of billable bookings according to Finance in June 2024 is: 25,538
DWH side
The billable bookings should come from the table int_core__booking_charge_events. Mainly, there’s 2 ways of charge a booking: either when the verification starts, or when the check-in starts.
The verification start date that is used for the logic has changed recently (mid June 24) because of the new estimated Guest Journey Start Date, that is when the link is used. Here you can see the changes:
- Commit 4f672800 - Change on int_core__verification_requests
- Commit 4f672800 - Change on int_core__booking_charge_events
Therefore, before this change, the booking charge events was considered a verification start that corresponded to joined_at_utc coming from the int_core__unified_user table.
In a nutshell, running this on 17th July 2024 for the month of June, the values retrieved are:
- Using verification estimated start date (link used at): 23,440
- Using verification guest joined date: 21,604
Query available here
with
booking_charge_events_joined_at as (
with
stg_core__booking as (select * from staging.stg_core__booking),
int_core__price_plans as (select * from intermediate.int_core__price_plans),
int_core__verification_requests as (
select * from intermediate.int_core__verification_requests
),
booking_with_relevant_price_plans as (
select
*,
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then icuu.joined_at_utc
end as booking_fee_charge_at_utc
from stg_core__booking b
left join
int_core__verification_requests vr
on b.id_verification_request = vr.id_verification_request
left join intermediate.int_core__unified_user icuu
on vr.id_user_guest = icuu.id_user
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
where
-- The dates that defines which price plan applies to the booking depends
-- on charged by type. With the below case, we evaluate if a certain price
-- plan relates to the booking
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then
coalesce(
(
icuu.joined_at_utc
between pp.start_at_utc and pp.end_at_utc
),
false
)
else false
end
= true
),
untied_bookings as (
-- If a booking has two valid price plans, take the earliest
select id_booking, min(id_price_plan) as id_price_plan
from booking_with_relevant_price_plans brpp
group by id_booking
)
select
ub.id_booking,
ub.id_price_plan,
brpp.booking_fee_local,
brpp.booking_fee_charge_at_utc,
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
from untied_bookings ub
left join
booking_with_relevant_price_plans brpp
on ub.id_booking = brpp.id_booking
and ub.id_price_plan = brpp.id_price_plan
),
booking_charge_events_estimated_at as (
with
stg_core__booking as (select * from staging.stg_core__booking),
int_core__unified_user as (select * from intermediate.int_core__unified_user),
int_core__price_plans as (select * from intermediate.int_core__price_plans),
int_core__verification_requests as (
select * from intermediate.int_core__verification_requests
),
booking_with_relevant_price_plans as (
select
*,
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then vr.verification_estimated_started_at_utc
end as booking_fee_charge_at_utc
from stg_core__booking b
left join
int_core__verification_requests vr
on b.id_verification_request = vr.id_verification_request
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
where
-- The dates that defines which price plan applies to the booking depends
-- on charged by type. With the below case, we evaluate if a certain price
-- plan relates to the booking
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then
coalesce(
(
vr.verification_estimated_started_at_utc
between pp.start_at_utc and pp.end_at_utc
),
false
)
else false
end
= true
),
untied_bookings as (
-- If a booking has two valid price plans, take the earliest
select id_booking, min(id_price_plan) as id_price_plan
from booking_with_relevant_price_plans brpp
group by id_booking
)
select
ub.id_booking,
ub.id_price_plan,
brpp.booking_fee_local,
brpp.booking_fee_charge_at_utc,
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
from untied_bookings ub
left join
booking_with_relevant_price_plans brpp
on ub.id_booking = brpp.id_booking
and ub.id_price_plan = brpp.id_price_plan
)
select
'verification_estimated_at' as type,
COUNT(distinct id_booking) as billable_bookings
from booking_charge_events_estimated_at
where date_trunc('month', booking_fee_charge_date_utc) = '2024-06-01'
union all
select
'verification_guest_joined_at' as type,
COUNT(distinct id_booking) as billable_bookings
from booking_charge_events_joined_at
where date_trunc('month', booking_fee_charge_date_utc) = '2024-06-01'
First volume check
Well, given that there’s clearly a difference between Finance numbers and what we can obtain from DWH (in either case…), before jumping into some python-based project that runs queries in the Core schema directly… maybe let’s do some simple checks.
The Account Reports Summary excel is split per Company name, and it should be easy to retrieve the Deal Id linked to the company name (at least manually) for the most notable cases. So I just ordered the file in descending order of billable bookings and retrieved the first one:
- Company Name: LavandaBilling Account
- Billable Bookings: 3,178
This has a couple of id_deals linked to it: '1604445496','20529225110’
Query available here
select id_user as id_user_host, id_deal, company_name
from intermediate.int_core__unified_user icuu
where id_deal IN ('1604445496','20529225110')
The good news is that we have the KPIs view by Deal, so we can retrieve the information for these 2 Deals super fast. It will be considering the estimated start date, as it’s the current behaviour for DWH.
The bad news is that… well… there’s no billable bookings at all!
Query available here
select
*
from intermediate.int_monthly_aggregated_metrics_history_by_deal
where date = '2024-06-30'
and id_deal in ('20529225110','1604445496')
order by date desc
So clearly there’s a big difference hiding somewhere.
Do we have a similar behaviour if using the joined_at_utc for verification start in booking_charge_events?
… well, yes. Exactly 0 billable bookings!
Query available here
with
booking_charge_events_joined_at as (
with
stg_core__booking as (select * from staging.stg_core__booking),
int_core__price_plans as (select * from intermediate.int_core__price_plans),
int_core__verification_requests as (
select * from intermediate.int_core__verification_requests
),
booking_with_relevant_price_plans as (
select
*,
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then icuu.joined_at_utc
end as booking_fee_charge_at_utc
from stg_core__booking b
left join
int_core__verification_requests vr
on b.id_verification_request = vr.id_verification_request
left join intermediate.int_core__unified_user icuu
on vr.id_user_guest = icuu.id_user
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
where
-- The dates that defines which price plan applies to the booking depends
-- on charged by type. With the below case, we evaluate if a certain price
-- plan relates to the booking
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then
coalesce(
(
icuu.joined_at_utc
between pp.start_at_utc and pp.end_at_utc
),
false
)
else false
end
= true
),
untied_bookings as (
-- If a booking has two valid price plans, take the earliest
select id_booking, min(id_price_plan) as id_price_plan
from booking_with_relevant_price_plans brpp
group by id_booking
)
select
ub.id_booking,
ub.id_price_plan,
brpp.booking_fee_local,
brpp.booking_fee_charge_at_utc,
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
from untied_bookings ub
left join
booking_with_relevant_price_plans brpp
on ub.id_booking = brpp.id_booking
and ub.id_price_plan = brpp.id_price_plan
),
booking_charge_events_estimated_at as (
with
stg_core__booking as (select * from staging.stg_core__booking),
int_core__unified_user as (select * from intermediate.int_core__unified_user),
int_core__price_plans as (select * from intermediate.int_core__price_plans),
int_core__verification_requests as (
select * from intermediate.int_core__verification_requests
),
booking_with_relevant_price_plans as (
select
*,
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then vr.verification_estimated_started_at_utc
end as booking_fee_charge_at_utc
from stg_core__booking b
left join
int_core__verification_requests vr
on b.id_verification_request = vr.id_verification_request
left join int_core__price_plans pp on b.id_user_host = pp.id_user_host
where
-- The dates that defines which price plan applies to the booking depends
-- on charged by type. With the below case, we evaluate if a certain price
-- plan relates to the booking
case
when pp.price_plan_charged_by_type = 'CheckInDate'
then b.check_in_at_utc between pp.start_at_utc and pp.end_at_utc
when pp.price_plan_charged_by_type in ('VerificationStartDate', 'All')
then
coalesce(
(
vr.verification_estimated_started_at_utc
between pp.start_at_utc and pp.end_at_utc
),
false
)
else false
end
= true
),
untied_bookings as (
-- If a booking has two valid price plans, take the earliest
select id_booking, min(id_price_plan) as id_price_plan
from booking_with_relevant_price_plans brpp
group by id_booking
)
select
ub.id_booking,
ub.id_price_plan,
brpp.booking_fee_local,
brpp.booking_fee_charge_at_utc,
cast(brpp.booking_fee_charge_at_utc as date) as booking_fee_charge_date_utc
from untied_bookings ub
left join
booking_with_relevant_price_plans brpp
on ub.id_booking = brpp.id_booking
and ub.id_price_plan = brpp.id_price_plan
)
select
'verification_estimated_at' as type,
COUNT(distinct bce.id_booking) as billable_bookings
from booking_charge_events_estimated_at bce
inner join intermediate.int_core__bookings b
on bce.id_booking = b.id_booking
inner join intermediate.int_core__unified_user u
on b.id_user_host = u.id_user
and u.id_deal in ('1604445496','20529225110')
where date_trunc('month', bce.booking_fee_charge_date_utc) = '2024-06-01'
union all
select
'verification_guest_joined_at' as type,
COUNT(distinct bce.id_booking) as billable_bookings
from booking_charge_events_joined_at bce
inner join intermediate.int_core__bookings b
on bce.id_booking = b.id_booking
inner join intermediate.int_core__unified_user u
on b.id_user_host = u.id_user
and u.id_deal in ('1604445496','20529225110')
where date_trunc('month', bce.booking_fee_charge_date_utc) = '2024-06-01'
So clearly we have a situation here. Thankfully, Clay took the time to explain to me that this company is a special case in which these bookings do not have Guest Journeys, so looks suspicious to me in the sense that probably this is affecting somehow the logic in DWH. This is because it’s an “autohost account”, that might probably come from Partner API.
I also tried with the 2nd Company with more Billable Bookings in June 2024, namely:
- Company Name: Home Team Vacation Rentals LLC
- Billable Bookings: 1,350
- Deal Id: ‘15463726437’
Reusing the previous query and changing the deal id, we get these results:
So it’s looking MUCH better, even though it’s true that we miss/exceed the 1,350 for a few bookings. Let’s be pragmatic and focus first on the big issues, and then on the small ones…
Debugging data-invoicing-exporter
I’ve already put some comments on the DevOps ticket, but might be worth to aim to understand where these Lavanda Billable Bookings come from.
Billable Booking Count is computed in dashboard_tools.py here, which depends on user_report_contents here. In turn, this is retrieved from the function get_user_data, defined here.
From here, we can retrieve the query being used on bookings queries.get_query_booking_data_of_user here.
Time to switch to queries.py, where the query we’re interested in is here. This query has some transactional sql.
Firstly, here, it creates a snapshot of the latest available PricePlanToUser at the moment of the extraction, to be applied to all the bookings of the exporting time period considered. These are the ones that are going to be used for the export. This is, technically, different from what we do in DWH in which we apply the price plan according on what was active at the moment of verification start or check-in. In case of more than 1 active price plan for a booking, we take the first one. This could explain changes on the volumes, but probably not the massive difference. We can check this later on.
Secondly, here, it somehow creates a unique booking table based on Guest, Accommodation and CheckIn date. This is very similar on how we handle it in DWH in int_core__duplicate_booking model. To keep in mind though that we’re not forcing this duplicate booking in DWH for booking charge events.
Thirdly, here, it computes a variable called ListingStartDateByField (which looks like a mistake in the name) that mainly retrieves the PricePlanChargedByTypeId. This comes from the snapshot made on point 1.
The last part is mainly a split on whether the price plan charge by type id is =3 or not. This for human beings means a different logic between A) if the billing needs to be considered in the CheckInDate and B) if it needs to be considered in the VerificationStartDate or All.
This is quite similar as we do in DWH in Booking Charge Events, that we just filter by the name instead of the ID.
These queries are massive monsters with around ~10 tables joined. However, these are all left joins (except for the unique bookings mentioned in the second point), so either the booking does not exist in Bookings or the difference lies in the WHERE clauses. For the sake of me not getting a stroke, I hope it’s the second, so let’s go.
Billable at check in here
- UserVerificationStatusId IS NOT NULL, from SuperhogUser table, understanding this as the guest user because of how it’s joined
- GuestUserId IS NOT NULL, from User table
- CreatedByUserId equals the user id we’re retrieving
- CheckIn date is between Extraction Start and Extraction End, in a nutshell
Else (meaning billable at verification start ) here
- UserVerificationStatusId IS NOT NULL, from SuperhogUser table, understanding this as the guest user because of how it’s joined
- GuestUserId IS NOT NULL, from User table
- CreatedByUserId equals the user id we’re retrieving
- and at least one of these conditions needs to be true
- If the VerificationRequestId from Booking table IS NOT NULL then the UpdatedDate from the VerificationRequest needs to be between Extraction Start and Extraction End
- If the VerificationRequestId from Booking table IS NULL then the JoinDate from the User table (Guest) needs to be between Extraction Start and Extraction End
If checking this versus what DWH is computing… well, it seems we do not have (or ever had) the UpdatedDate condition for the verification start. It would make sense that the main problem for the Lavanda subject comes from this point, after all, this condition only applies if VerificationRequestId is not filled, meaning it’s a booking that does not have a verification process because we ‘trust’ autohost verifications, and that would be in line with what Clay explained.
Let’s run a quick query to replicate the verification request being null/not null in DWH for this case. I’ll also retrieve the latest price plan for the users assigned to the Id Deals.
Query here!
with pp as (
select pp.*
from intermediate.int_core__price_plans pp
inner join intermediate.int_core__unified_user uu
on pp.id_user_host = uu.id_user
and uu.id_deal in ('20529225110', '1604445496')
where pp.end_date_utc >= '2024-07-01' and pp.start_date_utc < '2024-07-01'
)
select
host.id_deal,
b.id_user_host,
count(distinct b.id_booking) as cd_booking,
count(1) as count
from
intermediate.int_core__bookings b
inner join intermediate.int_core__unified_user host
on
b.id_user_host = host.id_user
inner join intermediate.int_core__unified_user guest
on
b.id_user_guest = guest.id_user
left join intermediate.int_core__verification_requests vr
on
b.id_verification_request = vr.id_verification_request
left join pp on pp.id_user_host = b.id_user_host
where
guest.id_user_verification_status is not null
and guest.id_user is not null
and host.id_deal in ('20529225110', '1604445496')
and pp.id_price_plan <> 3
and
(
b.id_verification_request is null
and date_trunc('month',
guest.joined_date_utc) = '2024-06-01'
or
b.id_verification_request is not null
and date_trunc('month',
vr.updated_date_utc) = '2024-06-01'
)
group by 1 ,2
order by cd_booking desc
The result of the query is the following:
and as you can see it’s quite close the Lavanda billable bookings from Finance:
At this stage I wonder if there’s 1 day difference in the period considered between the DWH query vs. Finance export; or if the data is exactly extracted over the last month.
By removing the filter on the deal id and including the 2 types of billing, meaning at verification start and at check in, we get:
Which is quite close to the 25,538 billable bookings from Finance. The fact that cd_booking and count display different numbers might be worth to double check to ensure there’s no duplicates.
Query here!
with pp as (
select pp.*
from intermediate.int_core__price_plans pp
inner join intermediate.int_core__unified_user uu
on pp.id_user_host = uu.id_user
where pp.end_date_utc >= '2024-07-01' and pp.start_date_utc < '2024-07-01'
),
verification_start as (
select
host.id_deal,
b.id_user_host,
'verification_start' as type,
count(distinct b.id_booking) as cd_booking,
count(1) as count
from
intermediate.int_core__bookings b
inner join intermediate.int_core__unified_user host
on
b.id_user_host = host.id_user
inner join intermediate.int_core__unified_user guest
on
b.id_user_guest = guest.id_user
left join intermediate.int_core__verification_requests vr
on
b.id_verification_request = vr.id_verification_request
left join pp on pp.id_user_host = b.id_user_host
where
guest.id_user_verification_status is not null
and guest.id_user is not null
and pp.id_price_plan <> 3
and
(
b.id_verification_request is null
and date_trunc('month',
guest.joined_date_utc) = '2024-06-01'
or
b.id_verification_request is not null
and date_trunc('month',
vr.updated_date_utc) = '2024-06-01'
)
group by 1,2,3
),
check_in as (
select
host.id_deal,
b.id_user_host,
'check-in' as type,
count(distinct b.id_booking) as cd_booking,
count(1) as count
from
intermediate.int_core__bookings b
inner join intermediate.int_core__unified_user host
on
b.id_user_host = host.id_user
inner join intermediate.int_core__unified_user guest
on
b.id_user_guest = guest.id_user
left join intermediate.int_core__verification_requests vr
on
b.id_verification_request = vr.id_verification_request
left join pp on pp.id_user_host = b.id_user_host
where
guest.id_user_verification_status is not null
and guest.id_user is not null
and pp.id_price_plan = 3
and b.id_verification_request is not null
and date_trunc('month',
b.check_in_at_utc) = '2024-06-01'
group by 1,2,3
),
totals as (
select * from check_in
union all
select * from verification_start
)
select
type,
sum(cd_booking) as cd_booking,
sum(count) as count
from totals
group by 1








