Commit graph

337 commits

Author SHA1 Message Date
Oriol Roqué Paniagua
8a639413f1 Merged PR 2290: Refactor mtd joins to improve performance
Refactor mtd joins to improve performance, as stated in the ticket:

We noticed that some of the new models for MTD purposes (KPIs reporting) take quite a bit of time to run some simple joins.

The main reason is that there's a double join that can be simplified. The current state is:

```
from int_dates_mtd d
        inner join
            sometable t
            on extract(year from t.table_date) = d.year
            and extract(month from t.table_date) = d.month
            and extract(day from t.table_date) <= d.day
```

and it can be changed to:

```
from int_dates_mtd d
        inner join
            sometable t
            ​on date_trunc('month', t.table_date)::date = d.first_day_month
            and extract(day from t.table_date) <= d.day
```

which is way faster, and keeps the same computation

Related work items: #18330
2024-07-12 12:53:00 +00:00
Joaquin Ossa
5f05b725b9 Removed database sources on select 2024-07-12 13:11:56 +02:00
Joaquin Ossa
a837760382 Changed staging to intermediate 2024-07-12 12:45:46 +02:00
Joaquin Ossa
4d5af3ba21 Added verification_request_booking_source to int_core__guest_satisfaction_responses model 2024-07-12 12:38:52 +02:00
Joaquin Ossa
cf467145d0 Fixed schema names 2024-07-12 11:06:09 +02:00
Joaquin Ossa
3001db919a Added models in schema for intermediate 2024-07-12 10:11:51 +02:00
Joaquin Ossa
2bfe3ccf3c Added the new field to int_core__bookings and to int_core__verification_requests 2024-07-11 17:16:51 +02:00
Joaquin Ossa
70c2c5f6bf Removed guests_id and modified query structure 2024-07-11 16:49:24 +02:00
Joaquin Ossa
00502b1597 Removed bookings_id so we have unique values for id_verification_request, added details as to how we classify the hosts 2024-07-11 16:26:41 +02:00
Joaquin Ossa
db04615039 Removed bookings_id so we have unique values for id_verification_request, added details as to how we classify the hosts 2024-07-11 16:19:33 +02:00
Joaquin Ossa
5efd91dfbb Created model int_core__verification_request_booking_source to have easier access to host category type in different models 2024-07-11 15:36:32 +02:00
Oriol Roqué Paniagua
3b75d9eefb Merged PR 2257: Expose guest revenue and guest journey payment metrics
This PR aims to expose the new metrics to the business KPIs report.
The new metrics exposed are, for the global and the by deal view:
- Guest Revenue
- Guest Revenue per Guest Journey Completed
- Guest Revenue per Guest Journey with Payment
- Guest Payments
- Guest Payments per Guest Journey Completed
- Guest Payments per Guest Journey with Payment
- Guest Journey with Payment
- Guest Journey Payment Rate

Changes:
- Silly change on the naming in the by deal view of `payment_rate_guest_journey` to be consistent with the global view.
- Silly change that I miss some GJ payment metric for the view by deal id.
- Added a new number format called `currency_gbp` - only for monetary metrics, available in the schema files
- Usual procedure to publish metrics: for global metrics, add them in the `int_mtd_aggregated_metrics`. I also changed the order of display.
- **Important**: to avoid displaying revenue figures until Xero invoicing is handled, I created a macro called `is_date_before_previous_month` that is called in the reporting equivalent models: `mtd_aggregated_metrics` in the where section and in the `monthly_aggregated_metrics_history_by_deal` as a case-when.

This should allow to expose all new metrics, and enable the publishing of a new update of the business kpis!

Related work items: #18107
2024-07-10 14:17:05 +00:00
Oriol Roqué Paniagua
20e7220ffe Merged PR 2246: KPIs refactor: naming convention and PBI sources replication
Changing naming to follow convention.
This PR has the following changes:
- the model `int_core__mtd_aggregated_metrics` has been moved to cross and changed the name to `int_mtd_aggregated_metrics`
- the model `int_core__monthly_aggregated_metrics_history_by_deal` has been moved to cross and changed the name to `int_monthly_aggregated_metrics_history_by_deal`
- the reporting models `core__mtd_aggregated_metrics` and `core__monthly_aggregated_metrics_history_by_deal` now source the `int_mtd_aggregated_metrics` and `int_monthly_aggregated_metrics_history_by_deal` to avoid breaking the production dashboard
- the reporting models have been duplicated from core into general with the correct names, i.e., `mtd_aggregated_metrics` and `monthly_aggregated_metrics_history_by_deal`
- Documentation has been moved in intermediate and replicated in reporting, adding comments on the currently in use models that are going to die soon.

This will allow for a transition of the PBI dashboard from one source to another. Exposures file still not touched since technically the report is still sourcing the 'legacy' models. Documentation of the refactor here: https://www.notion.so/knowyourguest-superhog/Refactoring-Business-KPIs-5deb6aadddb34884ae90339402ac16e3

Related work items: #18202
2024-07-09 15:14:50 +00:00
Oriol Roqué Paniagua
ca8334f1da Merged PR 2236: Refactor of already exposed metrics: listings, deals and guest journeys
Following yesterday's refactor of booking metrics, this PR provides a refactor of already exposed metrics: listings, deals and guest journeys.
-> Data is consistent with values already exposed.

Changes:
- for `int_core__mtd_listing_metrics`, `int_core__mtd_deal_metrics` and `int_core__mtd_guest_journey_metrics`:
1. remove the computation of the previous year metric value and the relative increment (last part of the query)
2. re-apply the formatting
- for `int_mtd_vs_previous_year_metrics`:
1. Reference listings, deals and GJ models
2. Include the metrics for these types in the `plain_kpi_combination` CTE
3. Add the computation of previous year and relative increment using the macro
- for `int_core__mtd_aggregated_metrics`
1. Remove and "hardcode" sources since all metrics now depend exclusively of `int_mtd_vs_previous_year_metrics`

This PR does not alter the exposed metrics in the production report. It does not aim to change the name of the reporting/intermediate models that expose the information, it will be done in a separated PR.
Documentation: https://www.notion.so/knowyourguest-superhog/Refactoring-Business-KPIs-5deb6aadddb34884ae90339402ac16e3

Related work items: #18202
2024-07-09 13:00:43 +00:00
Oriol Roqué Paniagua
409ac47591 Merged PR 2232: KPI refactor - 1st step, bookings
First step on refactor of kpis:
- Remove relative incremental vs. previous year computation from the source model (`mtd_booking_metrics`, in this case)
- Aggregate the source mtd global metrics models into a single model: `int_mtd_vs_previous_year_metrics` (to enable multi-source weighted metric computation) and compute previous year value and relative increment. Now this logic is encapsulated into a macro `calculate_safe_relative_increment`, easing readability and providing a bit more robustness.
- End-to-end continuity to not break the existing dashboard display in `int_core__mtd_aggregated_metrics`

This is a substep of the global change. All info can be found in the documentation [here](https://www.notion.so/knowyourguest-superhog/Refactoring-Business-KPIs-5deb6aadddb34884ae90339402ac16e3)

Related work items: #18202
2024-07-08 15:58:36 +00:00
Joaquin Ossa
03dbe60b38 Fixed join on model taking id_verification instead of id_verification_request 2024-07-05 17:24:06 +02:00
Joaquin Ossa
34df8a7ef9 Merged PR 2201: Fixing errors raised by texts
Tiny PR to fix errors raised by texts
2024-07-05 15:12:04 +00:00
Joaquin Ossa
2738c8617d Fixing errors raised by texts 2024-07-04 16:41:41 +02:00
Joaquin Ossa
2efc1d8b65 Merged PR 2178: New model for guests satisfaction report
New model for guests satisfaction report, I included columns to check what is the guest paying for that might be helpful for analysis as well

Related work items: #16947
2024-07-04 10:13:11 +00:00
Joaquin Ossa
2c366414ce Merged PR 2188: Added date of adaptation
Added date of adaptation when hosts first started using check-in cover
2024-07-04 10:08:13 +00:00
Oriol Roqué Paniagua
1781031c9d Merged PR 2195: Computes GJ with Payment and GJ Payment Rate metrics
Adds the following metrics:
- Guest Journey with Payment
- Guest Journey Payment Rate

by both visions (global and by deal id)

**Important**: it does not expose these metrics to the dashboard, this will be done after we have feedback from Ben R. on the paid GJ without GJ completeness. Missing steps to make them appear is to adapt `int_core__mtd_aggregated_metrics` and `int_core__monthly_aggregated_metrics_history_by_deal` and the respective reporting counterparts.

It adapts:
- `int_core__mtd_guest_journey_metrics`
- `int_core__monthly_guest_journey_history_by_deal`

the approaches are similar in the sense that we join with `int_core__verification_payments` and filter by a PAID status, that has been defined in the `dbt_project.yml` in a similar manner as we did with cancelled bookings. It can happen that the same verification request has multiple payments (see screenshot), which in this case we keep the first date in which the paid payment happens. The volume is quite low anyway.

![image.png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2195/attachments/image.png)
code for the screenshot:

```
with pre as (
select
	id_verification_request,
	count(distinct icvp.id_payment) as total_paid_payments
from intermediate.int_core__verification_payments icvp
where icvp.payment_status = 'Paid'
group by 1
)
select
	case when total_paid_payments > 2 then 'more than 2'
	when total_paid_payments = 2 then '2'
	when total_paid_payments = 1 then '1'
	end as payment_volume_category,
	count(1) as vr_volume
from pre
group by 1
order by 2 desc
```

I also added a missing reference in `schema.yaml` int about `int_core__mtd_guest_journey_metrics`

Related work items: #18105
2024-07-04 09:54:41 +00:00
Joaquin Ossa
205bc6534d Modified date name to stick to convention 2024-07-04 08:24:23 +02:00
Joaquin Ossa
6bc26a66ff Changed date name to check_in_cover_added_date and included model in reporting 2024-07-03 17:43:27 +02:00
Joaquin Ossa
5bd6a2c254 Changed case to coalesce 2024-07-03 17:38:00 +02:00
Joaquin Ossa
3b22832d8b Created model in reporting as well with schema 2024-07-03 16:34:33 +02:00
Joaquin Ossa
fb61c69714 Changed names of types of payments for better clarity 2024-07-03 16:23:52 +02:00
Joaquin Ossa
3ce81dfdd1 Fixed the filter for check-in cover verification set and changed named to adopted 2024-07-03 16:16:58 +02:00
Joaquin Ossa
35e4735720 Added date of birth and kept age 2024-07-03 15:18:05 +02:00
Joaquin Ossa
fedf808b6b Added date of adaptation 2024-07-03 15:05:12 +02:00
Joaquin Ossa
af4ab70b96 Addresed Uri's comments and also included new reports in exposures.ymal 2024-07-03 12:29:01 +02:00
Oriol Roqué Paniagua
ed5d7828a7 Merged PR 2179: Computes aggregated metrics by deal id and exposes it to reporting
This PR creates 2 new models:
- `int_core__monthly_aggregated_metrics_history_by_deal`, which just gathers the information of the previously created models that compute the kpis by deal id.
- `core__monthly_aggregated_metrics_history_by_deal`, effectively a copy from intermediate to reporting

It also includes documentation of these 2 models, differences between these and the `mtd_aggregated_metrics` equivalents and references it to exposures. I took the opportunity to update the documentation of the `core__mtd_aggregated_metrics` now that it's a bit more mature.

This should be the last PR for the first draft of 'by deal' metrics.

Related work items: #17689
2024-07-03 07:06:34 +00:00
Joaquin Ossa
0cda63d1a7 New model for guests satisfaction report 2024-07-02 14:22:04 +02:00
Oriol Roqué Paniagua
f9741d6f69 Merged PR 2172: Adding accommodation metrics by deal id
Adding accommodation metrics by deal id with the model `int_core__monthly_accommodation_history_by_deal`.

With this PR, we have the full set of batch 1 metrics by deal id completed, although separated in different tables. Aggregation will come in a separated PR.

Similarly as the previous PR, this one it's a mix between the logic of `int_core__mtd_accommodation_metrics` and the logic existing for the `int_core__monthly_X_history_by_deal` . It also adds the tests in schema.

Related work items: #17689
2024-07-02 09:32:52 +00:00
Oriol Roqué Paniagua
1a4b6b4c14 Merged PR 2171: Adding Guest Journey metrics by deal id
Adding the 6 Guest Journey metrics by deal id by creating the model `int_core__monthly_guest_journey_history_by_deal`

The structure for the deal id detail follows yesterday's approach on bookings, namely `int_core__monthly_booking_history_by_deal`, but considering the metric computation of the guest journey, namely `int_core__mtd_guest_journey_metrics`.

It also adds the dbt tests ensuring that date and id_deal are not null and that the combination of both is unique.

Related work items: #17689
2024-07-02 07:26:20 +00:00
Oriol Roqué Paniagua
010135fb63 Merged PR 2164: Adding booking metrics by deal id for business kpis
This is a first approach to compute some easy metrics for the "deal" based business kpis. At this stage, it contains the information of bookings (created, checkout, cancelled) per deal and month, including both historic months as well as the current one. This do not contain MTD computation because it's overkill to do a MTD at deal level (+ we have 1k deals, so scalability can become a problem in the future)

Models:
- **int_dates_by_deal**: simple model that reads from **int_dates** and just joins it with **unified_users** to retrieve the deals. It will be used as the 'source of truth' for which deals should be considered in a given month, basically, since the first host associated to a deal is created (not necessarily booked)
- **int_core__monthly_booking_history_by_deal**: it contains the history of bookings per deal id in a monthly basis. It should be easy enough to integrate here, in the future and if needed, B2B macro segmentation.

In terms of performance, comparing the model **int_core__monthly_booking_history_by_deal** and **int_core__mtd_booking_metrics** you'll see that I removed the joined with the **int_dates_xxx** in the CTEs. This is because I want to avoid a double join of date & deal that I tried and I stopped after 5 min running. Since this computation is in a monthly basis - no MTD - it's easy enough to just apply the **int_dates_by_deal** on the last part of the query. With this approach, it runs in 7 seconds.

Related work items: #17689
2024-07-01 16:00:14 +00:00
Joaquin Ossa
256c638b04 Added id_deal to both intermediate and report model 2024-07-01 10:48:00 +02:00
Joaquin Ossa
cab300f7dd Addressed comments 2024-06-27 12:13:27 +02:00
Joaquin Ossa
a4b16e7410 Completed the schema for the model 2024-06-27 10:40:44 +02:00
Joaquin Ossa
9950c4d9ae Fixed merge error 2024-06-27 10:18:29 +02:00
Joaquin Ossa
04de0c8227 Changed name of model 2024-06-27 10:13:18 +02:00
Joaquin Ossa
431182a098 Removed accommodations from the model 2024-06-27 10:11:48 +02:00
Joaquin Ossa
2a897c6ead fixed model name and dump tables 2024-06-27 10:07:47 +02:00
Joaquin Ossa
cbd1b4414f Created int model for host accommodations with check in hero 2024-06-27 10:07:47 +02:00
Oriol Roqué Paniagua
5c12dd3b13 Merged PR 2125: Fixing accommodation host
Fixing accommodation host by using accommodation to user, after discussion with Ben R.
This improves data quality, even though there's some duplicates removal.
I checked and it effectively removes accommodations that mostly were considered as 'Never Booked', thus not a massive impact is expected for the business kpis. But in any case, let's do things properly :)

Related work items: #17538
2024-06-26 14:47:15 +00:00
Oriol Roqué Paniagua
6c053a0753 Merged PR 2107: Adds host lifecycle metrics into biz kpis
This PR closes the first draft of the first batch of business kpis. Host logic has changed to be applied at deal id level.
It's mostly an adapted copy-paste from the accommodation counterpart, specifically:
- `int_core__mtd_deal_lifecycle`: computes the historic deal lifecycle. One line for each deal and MTD date. **Important**: _Not all hosts have a deal set. This will need a data quality report for business teams to fix_
- `int_core__mtd_deal_metrics`: computes the aggregation at MTD date level of the metrics per lifecycle state and activity state

Additionally, this PR changes:
- `int_core__mtd_aggregated_metrics`: it includes the new 3 deal metrics and changes the source of the already existing 3 deal metrics from `mtd_booking_metrics` to the new `mtd_deal_metrics`
- `int_core__mtd_booking_metrics`: removes all code needed to compute the remaining deal metrics, speeding it up considerably.

After this PR, the mtd models run (locally) at the following speed:
- `int_core__mtd_accommodation_lifecycle`: 47 sec
- `int_core__mtd_deal_lifecycle`: 3 sec
- `int_core__mtd_accommodation_metrics`: 5 sec
- `int_core__mtd_deal_metrics`: < 1 sec
- `int_core__mtd_booking_metrics`: 8 sec (quite a reduction)
- `int_core__mtd_guest_journey_metrics`: 5 sec
- `int_core__mtd_aggregated_metrics` and `core__mtd_aggregated_metrics`: < 1 sec

Related work items: #17312
2024-06-25 12:20:59 +00:00
Oriol Roqué Paniagua
0655ac8997 Merged PR 2105: Adding listing lifecycle metrics into business KPIs
This PR will compute the listing metrics in an aggregated manner to be displayed in the Main KPIs dashboard, specifically:
- New Listings
- First Time Booked Listings
- Churning Listings
- It also adapts the computation for the already existing metrics of Listings Booked in X months

At code level, it contains the following:
- Adds `int_core__mtd_accommodation_metrics`, which computes the aggregation of the lifecycle of listings at date level (unique), being date the corresponding date from `int_dates_mtd`
- Changes `int_core__mtd_aggregated_metrics` to take the accommodation metrics from the new model. Those 3 already existing (Listings booked in X month) now read from the new model as well.
- Changes `int_core__mtd_booking_metrics` to remove unused computation, making it lighter. Specifically, it removes 1) listing related metrics, since now we have a dedicated model and 2) number of guests booked, since it's not used at all.

The resulting values in local are consistent with what is already reported in the staging report.

Related work items: #17312
2024-06-25 08:14:23 +00:00
Oriol Roqué Paniagua
f23e210129 Merged PR 2094: Removing lifecycle logic from int_core__accommodation
Removing lifecycle logic from int_core__accommodation
This logic is now available on int_core__mtd_accommodation_lifecycle

Related work items: #17312
2024-06-21 14:13:42 +00:00
Oriol Roqué Paniagua
ef80637a9b Merged PR 2090: Adding int_core__mtd_accommodation_lifecycle
Adding int_core__mtd_accommodation_lifecycle. Mainly, it recreates the history of the lifecycle of a listing for each date appearing in the MTD dates (so, last day of month + days for current month + days for current month of the previous year).

Implementation of lag function makes it much faster than self-join. Runs in approx 17 seconds (in local)

The logic behind the lifecycle is the same, and the most-up-to-date results in my local show the same values for the new model and the int_core__accommodation model (see screenshots)

previous model:
![image.png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2090/attachments/image.png)

new model:
![image (2).png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2090/attachments/image%20%282%29.png)

Following PRs will focus on readapting logic of int_core__accommodation to avoid the replication of lifecycle computation (just re-use the last available date in int_core__mtd_accommodation_lifecycle) and the creation of the desired metrics for the Biz Overview dashboard, including a refactor of the mtd_bookings to remove the listing logic from there.

Related work items: #17312
2024-06-21 13:59:14 +00:00
Oriol Roqué Paniagua
fe93f594f5 Merged PR 2084: Adding int_core__accommodation
Adding int_core__accommodation

Includes both:
- Main information of the accommodation, mostly coming from stg_core__accommodation and int_core__country.
- Listing lifecycle computation, based on the created bookings from stg_core__bookings. It's just the current state, no history.

Some considerations:
- I opted to use stg_core__bookings and not int_core__bookings. Main reason is in case at some point we want to add listing-based information to the booking table, it would avoid cyclic references.
- I opted to keep all the logic of 1) accommodation info and 2) lifecycle in the same model. This could be easily split into: lifecycle first that reads uniquely from staging and then the int_core__accommodation that could read from the staging version to retrieve accommodation attributes + the lifecycle one. Up to you

I'd suggest to review first the documentation in schema since it explains the logic applied.

Notion page linked to this task: https://www.notion.so/knowyourguest-superhog/Listing-lifecycle-4dc0311b21ca44f8859969e419872ebd

Related work items: #17312
2024-06-20 16:02:16 +00:00
Oriol Roqué Paniagua
839e5fae1b Merged PR 2077: Adding Country to intermediate
Adding Country to intermediate, both model + documentation.

At this stage, the model is set as a view but we can discuss what is the best approach

Related work items: #17312
2024-06-19 15:34:15 +00:00