data-dwh-dbt-project/models/intermediate/athena/int_athena__verifications_with_fees.sql
Oriol Roqué Paniagua 06bdb81cfe Merged PR 4106: Avoids duplicates in athena verification with fees after price changes
# Description

There was some duplicated records that effectively raised alarms in data tests.

The fix is simple:
* Ensure that is using the timestamp field, rather than the date of checkout.
* This was not enough because in some cases the timestamp is exactly at midnight. So I changed the between to an explicit logic - start condition is inclusive, while end condition is exclusive.

# Checklist

- [ ] The edited models and dependants run properly with production data.
- [ ] The edited models are sufficiently documented.
- [ ] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #26622
2025-01-19 10:07:39 +00:00

42 lines
1.5 KiB
SQL

{% set ok_status = "Approved" %}
with
int_athena__verifications as (select * from {{ ref("int_athena__verifications") }}),
stg_seed__athena_price_history as (
select * from {{ ref("stg_seed__athena_price_history") }}
),
-- CTE to rank verifications by updated_at_utc per id_booking
ranked_verifications as (
select
v.*,
row_number() over (
partition by v.id_booking order by v.updated_at_utc asc
) as rn
from int_athena__verifications v
where v.version = 'V1' and v.id_booking is not null
)
select
v.id_verification,
v.id_booking,
v.verification_status,
v.is_cancelled,
-- Charge for 1 night if number_nights = 0
case
when v.number_nights = 0 and v.verification_status = '{{ ok_status }}'
then ph.fee_per_night_gbp
when v.verification_status = '{{ ok_status }}'
then v.number_nights * ph.fee_per_night_gbp
else 0
end as ok_status_fee_in_gbp,
v.created_date_utc,
v.checkout_date_utc
from ranked_verifications v
left join
stg_seed__athena_price_history ph
-- The following condition ensures avoiding duplicates.
-- Keep in mind that the start_at_utc is inclusive to the price,
-- while the end_at_utc is exclusive.
on v.checkout_at_utc >= ph.start_at_utc
and v.checkout_at_utc < ph.end_at_utc
where
-- Select only the most recent verification for each id_booking
v.rn = 1