Merged PR 2770: First version of name unification for a deal

# Description

This PR aims to add a new field in the master table of deals, `int_core__deal`, to incorporate the name of the account/client in a new field called `main_deal_name`.

Since a deal can have several accounts and there's no deal name existing in the backend, I just use `first_name`, `last_name` and `company_name` as the sources for the potential names. With some character removal via regex, I take the name that gets repeated the most within certain obvious exceptions (i.e., no "Bookings", "LLC", etc).

This is opinionated and could be done differently, happy to hear your thoughts.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #18911
This commit is contained in:
Oriol Roqué Paniagua 2024-09-09 10:23:03 +00:00 committed by Pablo Martín
commit 6daec3135c
2 changed files with 65 additions and 11 deletions

View file

@ -25,22 +25,63 @@ with
c.iso_2 as main_billing_country_iso_2_per_deal, c.iso_2 as main_billing_country_iso_2_per_deal,
c.iso_3 as main_billing_country_iso_3_per_deal c.iso_3 as main_billing_country_iso_3_per_deal
from billing_countries_per_deal bcpd from billing_countries_per_deal bcpd
left join int_core__country c left join int_core__country c on bcpd.id_billing_country = c.id_country
on bcpd.id_billing_country = c.id_country
where bcpd.rn = 1 where bcpd.rn = 1
),
potential_name_per_deal as (
select
id_deal,
trim(
regexp_replace(potential_name, '[0-9\(\)\-\.\,@]', '', 'g')
) as potential_deal_name,
count(*) as potential_name_count
from
(
select id_deal, first_name as potential_name
from int_core__unified_user
where id_deal is not null
union all
select id_deal, last_name as potential_name
from int_core__unified_user
where id_deal is not null
union all
select id_deal, company_name as potential_name
from int_core__unified_user
where id_deal is not null
) as names
-- The following removes tons of names that are just Bookings,
-- LTD, LLC, INC, or empty strings
where
trim(upper(potential_name)) not in ('BOOKINGS', 'LTD', 'LLC', 'INC', '')
and potential_name is not null
group by 1, 2
),
unique_name_per_deal as (
select id_deal, potential_deal_name as main_deal_name
from
(
select
id_deal,
potential_deal_name,
potential_name_count,
row_number() over (
partition by id_deal order by potential_name_count desc
) as rn
from potential_name_per_deal
) as ranked_names
where rn = 1
) )
select select
uu.id_deal, uu.id_deal,
unpd.main_deal_name,
mbcpd.main_id_billing_country_per_deal, mbcpd.main_id_billing_country_per_deal,
mbcpd.main_billing_country_name_per_deal, mbcpd.main_billing_country_name_per_deal,
mbcpd.main_billing_country_iso_2_per_deal, mbcpd.main_billing_country_iso_2_per_deal,
mbcpd.main_billing_country_iso_3_per_deal, mbcpd.main_billing_country_iso_3_per_deal,
count(distinct uu.id_user) as users_with_this_id_deal, count(distinct uu.id_user) as users_with_this_id_deal,
count( count(distinct uu.billing_country_iso_3) as billing_countries_for_this_id_deal
distinct uu.billing_country_iso_3
) as billing_countries_for_this_id_deal
from main_billing_country_per_deal mbcpd from main_billing_country_per_deal mbcpd
left join int_core__unified_user uu left join int_core__unified_user uu on uu.id_deal = mbcpd.id_deal
on uu.id_deal = mbcpd.id_deal left join unique_name_per_deal unpd on uu.id_deal = unpd.id_deal
where mbcpd.id_deal is not null where mbcpd.id_deal is not null
group by 1, 2, 3, 4, 5 group by 1, 2, 3, 4, 5, 6

View file

@ -2971,6 +2971,8 @@ models:
the same time, not all users that act as hosts have a deal associated. One the same time, not all users that act as hosts have a deal associated. One
example is Know Your Guest (KYG) Lite accounts. However, there's also historical example is Know Your Guest (KYG) Lite accounts. However, there's also historical
cases that for whatever reason there's no Deal associated. cases that for whatever reason there's no Deal associated.
For this model, the billing country and the deal name are estimated based on
the information available in int_core__unified_user.
columns: columns:
- name: id_deal - name: id_deal
@ -2979,6 +2981,17 @@ models:
tests: tests:
- not_null - not_null
- unique - unique
- name: main_deal_name
data_type: string
description: |
Main name for this ID deal. It's a clean version of
the most repeated name within the user tables in the
fields of first_name, last_name and company name.
This field should be modified at the moment we have
a proper way to retrieve a common account name per deal.
It can contain duplicates.
tests:
- not_null
- name: main_id_billing_country_per_deal - name: main_id_billing_country_per_deal
data_type: integer data_type: integer
description: | description: |