Commit graph

197 commits

Author SHA1 Message Date
Pablo Martin
a20d511b93 missing text 2024-09-17 16:12:38 +02:00
Pablo Martin
049c2d3e31 model and docs+tests 2024-09-17 15:47:45 +02:00
Pablo Martin
f3e9985d0f model and docs 2024-09-17 12:16:52 +02:00
Pablo Martin
ac87ab4cfd create file 2024-09-17 11:56:21 +02:00
Joaquin Ossa
ae1e45a937 Merged PR 2869: Update mtd_aggregated_metrics guest payments to be tax-exclusive
# Description

Update mtd_aggregated_metrics guest payments to be tax-exclusive

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Update mtd_aggregated_metrics guest payments to be tax-exclusive

Related work items: #20045
2024-09-17 07:27:08 +00:00
Oriol Roqué Paniagua
7b9ba021c1 Merged PR 2865: Propagate tax exclusive check in hero payments
# Description

This PR propagates tax exclusive check in hero payments for the reporting of Check in Hero.
Additionally, it keeps propagating the amounts with tax inclusiveness in case, at some point, we need them. These are with the new naming convention.

In order not to break anything, the previous amounts are duplicated and aliased in reporting.

Lastly, I spent some time adding some clarifications and documenting payments set to currency and the dependant used for check in hero.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20046
2024-09-17 07:25:09 +00:00
Joaquin Ossa
b21590ab1d Simplified it 2024-09-17 09:05:38 +02:00
Joaquin Ossa
c3e4a07050 Update mtd_aggregated_metrics 2024-09-17 09:05:38 +02:00
Pablo Martin
60ea19f5ef remove dangling comment 2024-09-17 09:05:38 +02:00
Pablo Martin
fcbaae1381 update schemas 2024-09-17 09:05:38 +02:00
Pablo Martin
fd3b67935b remove old booking metrics 2024-09-17 09:05:37 +02:00
Pablo Martin
4b5c4d1967 create individual models for each booking metric 2024-09-17 09:04:44 +02:00
Joaquin Ossa
50a8a512fe Update mtd_aggregated_metrics guest payments to be tax-exclusive 2024-09-17 08:38:33 +02:00
Joaquin Ossa
3d8ede273e Simplified it 2024-09-16 16:20:36 +02:00
Joaquin Ossa
54af8af30e Update mtd_aggregated_metrics 2024-09-16 16:01:49 +02:00
Oriol Roqué Paniagua
c1b97e17e6 Merged PR 2852: Fix: ensure priority selection on user migration
# Description

Fixing logic to ensure priority selection of claims when user satisfies multiple claim conditions.
It adds a new parameter that forcefully prioritises the selection of the date value for a certain claim over the others. If the value is repeated among claims, it will select the earliest date.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20773
2024-09-16 09:34:44 +00:00
Oriol Roqué Paniagua
cf1d6e28cc Merged PR 2849: Fix: KYG lite migration with proper date migration handling
# Description

This PR fixes the New Dash migration issue that happened on September 10th 2024. In this migration, users were directly assigned the claim of KygMvp that does not contain a date value. We were using a default hardcode of the first MVP migration, thus in DWH all users have been considered to be migrated late July instead of splitting the first 22 in late July and the ~200 others in September.

The issue lies in the fact that users have configured a ProductBundle and can have Bookings with ProductBundle BEFORE the migration date, which greatly breaks the logic of a migration monitoring.

Changes:
* New migration phase added based on the claim MvpMigratedUser, that Ben created on Friday 13th
* Adaptation of the code in int_core__user_migration to detect if the claim_value (a text field) has a date or not. If so, use that date as long as it's equal or greater than the deployment date, if not use the deployment date. If the claim does not contain a date, use the deployment date (this is the case for the first true 22 migrated users)

I checked that volumes now look correct with this fix.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20773
2024-09-16 07:57:41 +00:00
Pablo Martin
6434c5c88d replace custom generic test with standard range between one 2024-09-13 15:21:22 +02:00
Pablo Martin
4cf74db49a replace hardcode with variable 2024-09-13 15:21:22 +02:00
Pablo Martin
092b8bd725 clarify refund is not in GBP 2024-09-13 15:21:22 +02:00
Pablo Martin
ec8490527d reorder case for readability 2024-09-13 15:20:39 +02:00
Pablo Martin
0312a1ccd8 remove explicit version reference, rely on latest 2024-09-13 15:20:39 +02:00
Pablo Martin
da3070949a finish columns and schema 2024-09-13 15:20:39 +02:00
Pablo Martin
63599d7a9b add columns 2024-09-13 15:20:39 +02:00
Pablo Martin
8c96b96ee5 small comment 2024-09-13 15:20:39 +02:00
Pablo Martin
6136db8403 cte for pre-defining if vat is applicable or not 2024-09-13 15:20:39 +02:00
Pablo Martin
f58e97a3f2 set references for v1 2024-09-13 15:20:39 +02:00
Pablo Martin
2cd115eede remove unnecessary alias, remove unrelated errors in yaml 2024-09-13 15:20:39 +02:00
Pablo Martin
1974da99f7 fix typos 2024-09-13 15:20:39 +02:00
Pablo Martin
d4b797c741 add alias 2024-09-13 15:20:39 +02:00
Pablo Martin
4efc1ba50f v1 goes back to original state 2024-09-13 15:20:39 +02:00
Pablo Martin
bbb9558f62 yaml for new version, plus deprecation 2024-09-13 15:20:39 +02:00
Pablo Martin
4ba6c80d6f v2 model 2024-09-13 15:20:15 +02:00
Pablo Martin
bde6f12404 add new columns 2024-09-13 15:20:15 +02:00
Pablo Martin
0aaec6a619 fuck around with Uri 2024-09-13 15:20:15 +02:00
Pablo Martin
05d5cc6d10 fix schemas in intermediate 2024-09-12 15:38:50 +02:00
Oriol Roqué Paniagua
c336081d3d Merged PR 2825: Propagates deal Name and Billing Country in int_monthly_aggregated_metrics_history_by_deal
# Description

Changes (only in intermediate):
* Applies sqlfmt in KPIs source models (for some of them it was already applied). Specifically, the 3 Core models ONLY contains formatting changes
![image.png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2825/attachments/image.png)

* Adds `main_deal_name` and `main_billing_country_iso_3_per_deal` in `int_monthly_aggregated_metrics_history_by_deal`

* Adds the 2 new fields in the schema entry of `int_monthly_aggregated_metrics_history_by_deal`, including the dbt test not null in the deal name.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #18911, #19083
2024-09-12 12:04:04 +00:00
Pablo Martín
2733adfd31 Merged PR 2774: Inline CTEs on int_core__mtd_booking_metrics
# Description

This PR is a pure refactor (as in, doesn't change the output of the model at all).

The only purpose of the changes in this PR is to improve the performance on the model. The strategy to achieve this has been to inline the basic CTEs of the model (inline=replace references to the CTE with direct `ref` inside the model).

This works because it breaks an optimization fence.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20835
2024-09-09 12:55:24 +00:00
Pablo Martin
afaceefe88 comment a bit more 2024-09-09 12:31:34 +02:00
Pablo Martin
c22a840084 inline the CTEs of the model 2024-09-09 12:05:44 +02:00
uri
f18a2eb520 First version of name unification for a deal 2024-09-06 17:08:50 +02:00
Pablo Martin
01c9c0d8ad remove distincts from booking counts 2024-09-05 15:17:45 +02:00
Oriol Roqué Paniagua
435db55c1e Merged PR 2743: Fixes deal-based issues on the billing country dimension
# Description

Before deploying KPIs by Billing Country, we spotted some issues that were basically increases on the volumes of any metric on the by billing country dimension that was based on Deal. This means, `int_core__mtd_deal_metrics` and `int_xero__mtd_invoicing_metrics`.

This PR changes the following:
* Now the 2 abovementioned models depend on the `int_core__deal` model, instead of `int_core__user_host` (thus removing duplicated stuff)
* Now all models use the main billing country at deal level, instead of doing it so at host level. The reason is that some small amount of hosts that share the same deal can have a different billing country. To avoid weird stuff, everything points to this simplification - that in general, it's not a massive change in the output.
* In order to do so easily, the 3 main billing country per deal fields have been propagated to `int_core__user_host`

To exemplify the solution, find here a snapshot of the differences in behavior:

```
select
    dimension,
    sum(deals_booked_in_month) as deals_booked_1,
    sum(deals_booked_in_6_months) as deals_booked_6,
    sum(deals_booked_in_12_months) as deals_booked_12,
    sum(total_revenue_in_gbp) as total_revenue,
    sum(xero_operator_net_fees_in_gbp) as operator_revenue,
    sum(xero_booking_net_fees_in_gbp) as booking_fees,
    sum(xero_listing_net_fees_in_gbp) as listing_fees,
    sum(xero_verification_net_fees_in_gbp) as verification_fees,
    sum(total_guest_revenue_in_gbp) as guest_revenue,
    sum(xero_waiver_paid_back_to_host_in_gbp) as waiver_paid_back_to_hosts,
    sum(waiver_net_fees_in_gbp) as waiver_net_fees
from intermediate.int_mtd_vs_previous_year_metrics
where date in ('2024-01-31')
group by 1
order by 1
```
Production:
![image.png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2743/attachments/image.png)

vs.
Local:
![image (2).png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2743/attachments/image%20%282%29.png)

Keep in mind that still Global dimension can be greater than any other dimension aggregated since not all users have a deal. Mismatches between the other 2 dimensions might be linked to the dump.

Commits are meaningful and help navigate in the changes.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20823
2024-09-05 09:53:16 +00:00
Oriol Roqué Paniagua
fc95bd481a Merged PR 2742: Creation of int_core__deal
# Description

Creates a new master table containing the Deals. At this stage, the information is quite limited - I only included those fields that are useful for fixing the KPIs issues. To be enriched later on.

# Checklist

- [ ] The edited models and dependants run properly with production data. **Question here. Do I need to have the models materialised in local to see the dependants on DBT Docs?  I didn't see any uses of the previous model int_core__deal_id_master_list.**
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20822
2024-09-04 16:03:36 +00:00
Oriol Roqué Paniagua
556a52e991 Merged PR 2689: KPIs by Billing Country
# Description

Adds Billing Country dimension in KPIs, but does not expose them to reporting yet.
Silly thing, based on the macros I built, I cannot make incremental changes unless changing all models. This will need to be adapted, happy to hear your thoughts on how we do it.
Additionally, I have lack of performance of the model `mtd_guest_payments_metrics`. It takes around 5 min to execute, but technically the end-to-end runs in one shoot without breaking.
It's a complex PR because it changes many files, but you will see that:
* It mostly changes the join conditions for the dimensions or the schema tests,
* I tried to be very careful and add things step-by-step in the commits.

Goal is NOT to complete the PR yet until we see how we can improve performance. I can say though that data end-to-end looks ok to me, but would benefit from checking with production data for the new dimension

Update 30th Aug
* Added a new commit that includes `id_user_host` in `int_core__verification_payments`. Happy to discuss if it makes sense or not. But it changes the execution from ~600 sec to ~6 sec because it avoids a massive repeated join with `verification_requests`.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models. **To check because of performance issues**

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #19082
2024-09-04 10:17:12 +00:00
Oriol Roqué Paniagua
6d59e21310 Merged PR 2725: Force id user field to lower in staging
# Description

Forces lower case to all id_users in staging. Removes hardcoded lower case in intermediate. Adapts readme to contemplate the lowering of id users.

I propose to merge, run in prod and run tests in prod as a proper evaluation method.

BTW, I only find one id_user_host that was in capital letters, so that's why probably we didn't care that much about this. Still, I prefer have things clean from the start!

```
select *
from staging.stg_core__booking scb
left join intermediate.int_core__unified_user icuu
on lower(scb.id_user_host) = lower(icuu.id_user)
where scb.id_user_host <> icuu.id_user
```

# Checklist

- [ ] The edited models and dependants run properly with production data. **All models run in stg, did not check all the dependants**
- [ ] The edited models are sufficiently documented. **Have not checked**
- [ ] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20776
2024-09-03 14:36:21 +00:00
Oriol Roqué Paniagua
1b30fbbca9 Merged PR 2724: Removing coalesce from gbp conversion in int_core__host_booking_fees
# Description

Removing coalesce from gbp conversion in `int_core__host_booking_fees`

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs. **Message sent in data team channel**
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.
2024-09-03 13:34:08 +00:00
Oriol Roqué Paniagua
4cfc0dcd45 Merged PR 2642: Booking Charge Events to have a similar logic as invoicing
# Description

Based on the Notion page [here](https://www.notion.so/knowyourguest-superhog/Data-quality-assessment-Billable-Bookings-97008b7f1cbb4beb98295a22528acd03), this PR mainly adds:

- Charge at verification depends on when the Guest joined or the VR was updated (depending on if the verification request associated exists does not exists or it does, respectively)
- Add the logic to retrieve the last plan that is available at the beginning of each month.
- Additional where conditions, relatively documented, to imitate was is available in the invoicing process. This includes removal of duplicated bookings, guest verification and guest user existing.

Additional changes:
- Remove select star :)
- Added dbt tests that didn't exist before
- Add informative fields on 1) how many price plans were active in a given month, even though we just keep the last one and 2) cases in which bookings are created after the booking is supposed to be charged.

Data quality:´
- I have mixed feelings. This does not correspond 100% to the volumes provided by the exporter, though are quite close. For April, May and June 2024, this logic has more than 95% of accuracy. Still, the fact of using the guest joined, and especially the updated date, I feel like this will make past data "disappear" if the guest has another journey. I don't know for sure since we do not store incremental updates of user information.
I'd propose to move forward to have an estimated metric available anyway - with this or a similar logic, even the previous one based on the used link at but fixing the cases in which there's no VR associated.
Let's discuss it!

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #18111
2024-09-03 13:15:40 +00:00
Joaquin Ossa
ccef428020 Merged PR 2694: Basic model changes for edeposit
# Description

Basic model changes for edeposit

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-02 15:01:40 +00:00
Oriol Roqué Paniagua
7ba65999c3 Merged PR 2687: Materialize int_core__verification_payments as a table
# Description

Just materializes `int_core__verification_payments` as a table instead as a view to enhance compute.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them. **Technically I get errors in local but from what I see it's because I have different dumps for the currency conversion and the other sources. There's no such cases in prod from what I observed.**
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #19082
2024-08-29 14:25:18 +00:00