Commit graph

799 commits

Author SHA1 Message Date
uri
cb8beab129 Modifying strenght by tolerance to clarify naming 2024-09-06 09:15:33 +02:00
uri
0f73b70942 Adding new column test: latest_date_is_yesterday 2024-09-06 09:13:27 +02:00
Joaquin Ossa
1e89966153 Merged PR 2756: guesty_verifications model
# Description

I added verification status to be able to show the amount of bookings that have been approved vs not approved (rejected, flagged, or missing info)

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-05 15:32:43 +00:00
Joaquin Ossa
7036b24847 guesty_verifications model 2024-09-05 17:21:48 +02:00
Oriol Roqué Paniagua
00a97a3aff Merged PR 2755: Exposes By Billing Country dimension in KPIs to production
# Description

Parameter configuration to show the new dimension now that tests are implemented and look ok.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #19082, #20827
2024-09-05 14:55:30 +00:00
Pablo Martín
f553e73089 Merged PR 2753: Remove distincts from booking counts
# Description

This PR removes multiple `distinct` statements from counts on the booking metrics.

I'm doing this:
- Because they are making the model query perform terribly, and it's already starting to be way too slow.
- They are not that necessary since the uniqueness of the fields being `distinct`-ed it's already tested in upstream models.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20835
2024-09-05 14:54:50 +00:00
uri
836be7903b Fix config - again 2024-09-05 16:34:17 +02:00
uri
05058f9e05 Fix - wrong config 2024-09-05 16:18:49 +02:00
uri
2662527790 Enable By Billing Country dimension in prod 2024-09-05 16:17:05 +02:00
Oriol Roqué Paniagua
4322206091 Merged PR 2752: Adding specific business tests for kpis
# Description

Adds 2 tests for KPIs to:
1. Check that the values observed in the last run for the dimensions other than Global are consistent for additive metrics. Consistent does not mean exact, though
2. Check that the values observed in the last run for Global dimension are similar to what was observed previously, to raise an alert in case of outliers. This one is tricky because there's possibilities to have false positives, so extensive documentation on the test and parameters has been provided.

Note: This runs well targeting production. It also detects the cancelled bookings issue if it was supposed to run on 31st of August. Once an alert is raised, since it only takes into account the last update, usually will not raise it in the next day.

# Checklist (does not apply)

- [ ] The edited models and dependants run properly with production data.
- [ ] The edited models are sufficiently documented.
- [ ] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Adding specific business tests for kpis

Related work items: #20824
2024-09-05 14:01:58 +00:00
Pablo Martin
01c9c0d8ad remove distincts from booking counts 2024-09-05 15:17:45 +02:00
Joaquin Ossa
d7c8e2a84e Merged PR 2750: Changed model for bookings with 0 nights
# Description

Changed model for bookings with 0 nights

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-05 12:27:50 +00:00
Joaquin Ossa
920f6f7e51 Changed model for bookings with 0 nights 2024-09-05 13:56:07 +02:00
Joaquin Ossa
81f57e0b5a Merged PR 2748: guesty_verifications to reporting
# Description

guesty_verifications to reporting

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-05 11:38:04 +00:00
Joaquin Ossa
01115e0594 Merged PR 2749: Silly mistake fixed
# Description

_Describe your motivation and changes here._

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Silly mistake fixed
2024-09-05 10:47:13 +00:00
Joaquin Ossa
82629e040b Silly mistake fixed 2024-09-05 12:45:09 +02:00
Joaquin Ossa
8a49580888 guesty_verifications to reporting 2024-09-05 11:59:57 +02:00
Oriol Roqué Paniagua
435db55c1e Merged PR 2743: Fixes deal-based issues on the billing country dimension
# Description

Before deploying KPIs by Billing Country, we spotted some issues that were basically increases on the volumes of any metric on the by billing country dimension that was based on Deal. This means, `int_core__mtd_deal_metrics` and `int_xero__mtd_invoicing_metrics`.

This PR changes the following:
* Now the 2 abovementioned models depend on the `int_core__deal` model, instead of `int_core__user_host` (thus removing duplicated stuff)
* Now all models use the main billing country at deal level, instead of doing it so at host level. The reason is that some small amount of hosts that share the same deal can have a different billing country. To avoid weird stuff, everything points to this simplification - that in general, it's not a massive change in the output.
* In order to do so easily, the 3 main billing country per deal fields have been propagated to `int_core__user_host`

To exemplify the solution, find here a snapshot of the differences in behavior:

```
select
    dimension,
    sum(deals_booked_in_month) as deals_booked_1,
    sum(deals_booked_in_6_months) as deals_booked_6,
    sum(deals_booked_in_12_months) as deals_booked_12,
    sum(total_revenue_in_gbp) as total_revenue,
    sum(xero_operator_net_fees_in_gbp) as operator_revenue,
    sum(xero_booking_net_fees_in_gbp) as booking_fees,
    sum(xero_listing_net_fees_in_gbp) as listing_fees,
    sum(xero_verification_net_fees_in_gbp) as verification_fees,
    sum(total_guest_revenue_in_gbp) as guest_revenue,
    sum(xero_waiver_paid_back_to_host_in_gbp) as waiver_paid_back_to_hosts,
    sum(waiver_net_fees_in_gbp) as waiver_net_fees
from intermediate.int_mtd_vs_previous_year_metrics
where date in ('2024-01-31')
group by 1
order by 1
```
Production:
![image.png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2743/attachments/image.png)

vs.
Local:
![image (2).png](https://guardhog.visualstudio.com/4148d95f-4b6d-4205-bcff-e9c8e0d2ca65/_apis/git/repositories/54ac356f-aad7-46d2-b62c-e8c5b3bb8ebf/pullRequests/2743/attachments/image%20%282%29.png)

Keep in mind that still Global dimension can be greater than any other dimension aggregated since not all users have a deal. Mismatches between the other 2 dimensions might be linked to the dump.

Commits are meaningful and help navigate in the changes.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20823
2024-09-05 09:53:16 +00:00
Joaquin Ossa
bc77e5df08 Merged PR 2744: verifications model for Guesty
# Description

new model with Guesty verifications for business overview report, this one is a lot more simple since it only charges per nights of the booking. Though there is a slight problem that right now there are some verifications with bookings with 0 nights, those are being filtered out for the moment so they don't crash the model tests.

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-05 09:51:08 +00:00
Joaquin Ossa
7087e9193c changed ok_status_fee_in_gbp and added to description 2024-09-05 11:26:37 +02:00
Joaquin Ossa
126973ce1c included filter of number of nights as a temporary solution for the problem with the data 2024-09-05 09:09:40 +02:00
Oriol Roqué Paniagua
fc95bd481a Merged PR 2742: Creation of int_core__deal
# Description

Creates a new master table containing the Deals. At this stage, the information is quite limited - I only included those fields that are useful for fixing the KPIs issues. To be enriched later on.

# Checklist

- [ ] The edited models and dependants run properly with production data. **Question here. Do I need to have the models materialised in local to see the dependants on DBT Docs?  I didn't see any uses of the previous model int_core__deal_id_master_list.**
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20822
2024-09-04 16:03:36 +00:00
Joaquin Ossa
d189aaa797 1st commit 2024-09-04 17:24:55 +02:00
Pablo Martín
dd57c28768 Merged PR 2728: edeposit verifications docs and small refactors
# Description

This PR:
- Adds half-decent docs to `stg_edeposit__verifications` and tests. I say half-decent because I would describe our tests as "as strict as the backend guidance allows". But we can't do miracles, so it stays this way for now.
- Shifts a few column operations that were being done in the `int` layer into the `stg` layer.
- Also removes a couple of fields from `int` that were marked as deprecated by Ray. Would rather not have them at all beyond `stg`.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20123
2024-09-04 11:07:58 +00:00
Oriol Roqué Paniagua
556a52e991 Merged PR 2689: KPIs by Billing Country
# Description

Adds Billing Country dimension in KPIs, but does not expose them to reporting yet.
Silly thing, based on the macros I built, I cannot make incremental changes unless changing all models. This will need to be adapted, happy to hear your thoughts on how we do it.
Additionally, I have lack of performance of the model `mtd_guest_payments_metrics`. It takes around 5 min to execute, but technically the end-to-end runs in one shoot without breaking.
It's a complex PR because it changes many files, but you will see that:
* It mostly changes the join conditions for the dimensions or the schema tests,
* I tried to be very careful and add things step-by-step in the commits.

Goal is NOT to complete the PR yet until we see how we can improve performance. I can say though that data end-to-end looks ok to me, but would benefit from checking with production data for the new dimension

Update 30th Aug
* Added a new commit that includes `id_user_host` in `int_core__verification_payments`. Happy to discuss if it makes sense or not. But it changes the execution from ~600 sec to ~6 sec because it avoids a massive repeated join with `verification_requests`.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models. **To check because of performance issues**

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #19082
2024-09-04 10:17:12 +00:00
Oriol Roqué Paniagua
940896824f Merged PR 2730: Exposing Billable Bookings metric for KPIs
# Description

Exposes Billable Bookings metric for KPIs, both in the "global+dimension" view and in the "deal" view.

Metrics have already been created for a while. Exposing them now after the changes carried out in the model `int_core__booking_charge_events`. Based on the current quality of the data, I opted for "Est. Billable Bookings" to account for the fact that this is an estimation. If you don't feel comfortable with it, let's remove the "Est. ".

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #18111
2024-09-04 08:15:37 +00:00
Joaquin Ossa
7e92037c66 Merged PR 2731: edeposit_verifications_fees to reporting
# Description

E-deposit_verifications_fees to reporting, this would be the model to use in the business overview report.
As discussed all the grouping calculations will be done inside power bi

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-04 08:07:22 +00:00
Joaquin Ossa
a3e6ad27c2 Set at 5% 2024-09-04 10:06:37 +02:00
Joaquin Ossa
f8131d9111 edeposit_verifications_fees to reporting, this would be the model to use in the business overview report 2024-09-04 09:27:13 +02:00
Pablo Martin
08abcb5373 add date fields to docs 2024-09-04 09:21:22 +02:00
Pablo Martin
443216bfaf more fixes in docs 2024-09-04 09:15:33 +02:00
Pablo Martin
cfab0b4b33 add missing field 2024-09-04 09:12:21 +02:00
Pablo Martin
475116a32b rename field in docs 2024-09-04 09:12:17 +02:00
Joaquin Ossa
470e5c7990 Merged PR 2679: edeposit_agg_fee_per_user to reporting
# Description

edeposit fees per verification
I modified the model so it contains only the fees charged per verification so then we can do the grouping and filters in power bi depending on how it needs to be displayed.
# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-04 07:08:38 +00:00
Pablo Martin
e750e288c5 fix typo in schema 2024-09-03 18:19:17 +02:00
Pablo Martin
9f80f0a916 shit date casts left 2024-09-03 18:03:19 +02:00
Pablo Martin
37856ec606 shift renames left, remove deprecated fields from int 2024-09-03 17:55:24 +02:00
Pablo Martin
c35c5cb033 schema and tests 2024-09-03 17:55:24 +02:00
Pablo Martin
9a6490e7fd minor change in model 2024-09-03 17:55:24 +02:00
Joaquin Ossa
97e9f0cf76 Merged PR 2727: changed athena name
# Description

changed athena name I forgot in model and only changed in schema

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [x] I have checked for DRY opportunities with other models and docs.
- [x] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20125
2024-09-03 15:23:57 +00:00
Joaquin Ossa
2602be2567 changed athena name 2024-09-03 17:09:52 +02:00
Joaquin Ossa
7d8983e0dd Fixed model, added created_date and checkout_date 2024-09-03 17:05:13 +02:00
Oriol Roqué Paniagua
6d59e21310 Merged PR 2725: Force id user field to lower in staging
# Description

Forces lower case to all id_users in staging. Removes hardcoded lower case in intermediate. Adapts readme to contemplate the lowering of id users.

I propose to merge, run in prod and run tests in prod as a proper evaluation method.

BTW, I only find one id_user_host that was in capital letters, so that's why probably we didn't care that much about this. Still, I prefer have things clean from the start!

```
select *
from staging.stg_core__booking scb
left join intermediate.int_core__unified_user icuu
on lower(scb.id_user_host) = lower(icuu.id_user)
where scb.id_user_host <> icuu.id_user
```

# Checklist

- [ ] The edited models and dependants run properly with production data. **All models run in stg, did not check all the dependants**
- [ ] The edited models are sufficiently documented. **Have not checked**
- [ ] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #20776
2024-09-03 14:36:21 +00:00
Oriol Roqué Paniagua
1b30fbbca9 Merged PR 2724: Removing coalesce from gbp conversion in int_core__host_booking_fees
# Description

Removing coalesce from gbp conversion in `int_core__host_booking_fees`

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs. **Message sent in data team channel**
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.
2024-09-03 13:34:08 +00:00
Joaquin Ossa
c24c875336 Fixed test error 2024-09-03 15:19:04 +02:00
Joaquin Ossa
560e2a5994 Modified model to only have fees 2024-09-03 15:19:04 +02:00
Joaquin Ossa
920733fceb Updating with Ray's comments 2024-09-03 15:19:04 +02:00
Joaquin Ossa
a99d4f622f Modified model to only have fees 2024-09-03 15:18:10 +02:00
Joaquin Ossa
906bccce0e modified model 2024-09-03 15:17:43 +02:00
Oriol Roqué Paniagua
4cfc0dcd45 Merged PR 2642: Booking Charge Events to have a similar logic as invoicing
# Description

Based on the Notion page [here](https://www.notion.so/knowyourguest-superhog/Data-quality-assessment-Billable-Bookings-97008b7f1cbb4beb98295a22528acd03), this PR mainly adds:

- Charge at verification depends on when the Guest joined or the VR was updated (depending on if the verification request associated exists does not exists or it does, respectively)
- Add the logic to retrieve the last plan that is available at the beginning of each month.
- Additional where conditions, relatively documented, to imitate was is available in the invoicing process. This includes removal of duplicated bookings, guest verification and guest user existing.

Additional changes:
- Remove select star :)
- Added dbt tests that didn't exist before
- Add informative fields on 1) how many price plans were active in a given month, even though we just keep the last one and 2) cases in which bookings are created after the booking is supposed to be charged.

Data quality:´
- I have mixed feelings. This does not correspond 100% to the volumes provided by the exporter, though are quite close. For April, May and June 2024, this logic has more than 95% of accuracy. Still, the fact of using the guest joined, and especially the updated date, I feel like this will make past data "disappear" if the guest has another journey. I don't know for sure since we do not store incremental updates of user information.
I'd propose to move forward to have an estimated metric available anyway - with this or a similar logic, even the previous one based on the used link at but fixing the cases in which there's no VR associated.
Let's discuss it!

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #18111
2024-09-03 13:15:40 +00:00