# Description
Creates skeleton for new KPIs data flow for created_bookings metric. Details are accessible [here](https://www.notion.so/knowyourguest-superhog/KPIs-Refactor-Let-s-go-daily-2024-10-23-1280446ff9c980dc87a3dc7453e95f06?pvs=4#12a0446ff9c98085bf4dfc77f6fc22f7)
In essence:
* Models are created in intermediate in a kpis folder.
* Models have a daily segmentation. This includes `created_bookings` models, but also the daily lifecycle per listing and the segmentation. It also adds a `dimension_dates` model specific for KPIs. These have all the dimensions already in place and handle all the crazy logic.
* Other time aggregation models simply read from existing daily models which are much easier (`int_kpis__metric_mtd_created_bookings` and `int_kpis__metric_monthly_created_bookings`).
* Dimensionality aggregation can be easily added within a given timeframe (daily, mtd, monthly). For instance, I do it for mtd in the `int_kpis__aggregated_mtd_created_bookings` and for monthly in `int_kpis__aggregated_monthly_created_bookings`
* Macro configuration for dimensions: Allows to set any specific dimension for `aggregated` models. By default, the subset of global, by billing country, by number of listings and by deal apply - since these are needed for Main KPIs. I added an example with Dash Source, that currently does not exist and it's currently configured as only appearing for created bookings.
* Testing `aggregated` models completeness. A new macro called `assert_dimension_completeness` is available that ensures additive metrics are consistent vs. the global result, configurable at schema level.
* Testing refactor impact. I'm aware that changing the lifecycle model to daily impacts the volumes for listing segments. For the rest, I added a `tmp` test that checks that the dimension and dimension value per date exactly match comparing new vs. old computation.
Latest edits:
* Changed naming convention
* Split of MTD and Monthly. Now these are 2 different entities, as stated in `int_kpis__dimension_dates`.
* Added start_date and end_date for models that contemplate a range (mtd, monthly).
* Added a small readme entry in the kpis folders. Mostly it states nomenclature and some first conventions.
Dbt docs:

# Checklist
- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs. **Likely we'll be able to add macros for mtd and dim_agg models. We will see later on.**
- [ ] I've picked the right materialization for the affected models. **Models run ok except for the daily lifecycle of listings, which lasts several minutes in the first run. Model curr...
# Description
Before deploying KPIs by Billing Country, we spotted some issues that were basically increases on the volumes of any metric on the by billing country dimension that was based on Deal. This means, `int_core__mtd_deal_metrics` and `int_xero__mtd_invoicing_metrics`.
This PR changes the following:
* Now the 2 abovementioned models depend on the `int_core__deal` model, instead of `int_core__user_host` (thus removing duplicated stuff)
* Now all models use the main billing country at deal level, instead of doing it so at host level. The reason is that some small amount of hosts that share the same deal can have a different billing country. To avoid weird stuff, everything points to this simplification - that in general, it's not a massive change in the output.
* In order to do so easily, the 3 main billing country per deal fields have been propagated to `int_core__user_host`
To exemplify the solution, find here a snapshot of the differences in behavior:
```
select
dimension,
sum(deals_booked_in_month) as deals_booked_1,
sum(deals_booked_in_6_months) as deals_booked_6,
sum(deals_booked_in_12_months) as deals_booked_12,
sum(total_revenue_in_gbp) as total_revenue,
sum(xero_operator_net_fees_in_gbp) as operator_revenue,
sum(xero_booking_net_fees_in_gbp) as booking_fees,
sum(xero_listing_net_fees_in_gbp) as listing_fees,
sum(xero_verification_net_fees_in_gbp) as verification_fees,
sum(total_guest_revenue_in_gbp) as guest_revenue,
sum(xero_waiver_paid_back_to_host_in_gbp) as waiver_paid_back_to_hosts,
sum(waiver_net_fees_in_gbp) as waiver_net_fees
from intermediate.int_mtd_vs_previous_year_metrics
where date in ('2024-01-31')
group by 1
order by 1
```
Production:

vs.
Local:

Keep in mind that still Global dimension can be greater than any other dimension aggregated since not all users have a deal. Mismatches between the other 2 dimensions might be linked to the dump.
Commits are meaningful and help navigate in the changes.
# Checklist
- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.
# Other
- [ ] Check if a full-refresh is required after this PR is merged.
Related work items: #20823
# Description
Adds Billing Country dimension in KPIs, but does not expose them to reporting yet.
Silly thing, based on the macros I built, I cannot make incremental changes unless changing all models. This will need to be adapted, happy to hear your thoughts on how we do it.
Additionally, I have lack of performance of the model `mtd_guest_payments_metrics`. It takes around 5 min to execute, but technically the end-to-end runs in one shoot without breaking.
It's a complex PR because it changes many files, but you will see that:
* It mostly changes the join conditions for the dimensions or the schema tests,
* I tried to be very careful and add things step-by-step in the commits.
Goal is NOT to complete the PR yet until we see how we can improve performance. I can say though that data end-to-end looks ok to me, but would benefit from checking with production data for the new dimension
Update 30th Aug
* Added a new commit that includes `id_user_host` in `int_core__verification_payments`. Happy to discuss if it makes sense or not. But it changes the execution from ~600 sec to ~6 sec because it avoids a massive repeated join with `verification_requests`.
# Checklist
- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models. **To check because of performance issues**
# Other
- [ ] Check if a full-refresh is required after this PR is merged.
Related work items: #19082
# Description
Sets the parameter to display the KPIs by number of listings to prod. I will move forward without the review as we need a simultaneous deployment. The combination of changes were reviewed yesterday in local.
# Checklist
- [ ] The edited models and dependants run properly with production data.
- [ ] The edited models are sufficiently documented.
- [ ] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.
# Other
- [ ] Check if a full-refresh is required after this PR is merged.
Related work items: #19325
# Description
Changes:
* Separate 1) the internal naming of dimensions available within DWH vs. 2) the display of the dimensions in the reporting. Mainly it changes the "by_number_of_listings" to display "By # of Listings Booked in 12 Months". I edited the production macro since to me it's linked to when things are available for display.
* Add preceding zeros on the segmentation so it's ordered correctly. Before, the segment 21-60 was displayed before the 6-20.
* Also added some capital letters to the schema config of the reporting model :)
I attach a screenshot of how it looks in PBI in my local development branch to exemplify why this is "Beautification". Be aware that merging this also puts in production the dimensions.

# Checklist
- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.
# Other
- [ ] Check if a full-refresh is required after this PR is merged.
Related work items: #19325
# Description
Takes into account @<Pablo Martín> 's feedback from the previous PR, slightly modified. This PR separates 1) the dimensions while developing vs. 2) the dimensions once these are available for production. This are within the same file of macro configuration for KPIs, namely `business_kpis_configuration`.
End-goal, all CTEs in `int_mtd_vs_previous_year_metrics` will read from this new macro `get_kpi_dimensions_for_production`, so eventually we won't need any hardcode once we want to add new dimensions. In the meantime, I'll be adding this new line for each PR (still 2 missing :D)
# Checklist
- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [ ] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.
# Other
- [ ] Check if a full-refresh is required after this PR is merged.
Related work items: #19325
2024-08-19 09:57:28 +00:00
Renamed from macros/get_kpi_dimensions.sql (Browse further)