Merged PR 3922: Id_deal consolidation model

# Description

Intermediate model that joins all `id_deal` sources into one table.
I am using the `company_name` from core, the `deal_name` from hubspot and `contact_name` from xero as the `deal_name` for all.
There are many ids that are clearly tests or deleted

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #25827
This commit is contained in:
Joaquin Ossa 2025-01-03 13:30:48 +00:00
commit c81af72602
2 changed files with 152 additions and 0 deletions

View file

@ -0,0 +1,86 @@
with
int_xero__contacts as (select * from {{ ref("int_xero__contacts") }}),
int_core__unified_user as (select * from {{ ref("int_core__unified_user") }}),
int_hubspot__deal as (select * from {{ ref("int_hubspot__deal") }}),
core_deal_counts as (
select id_deal, count(distinct company_name) as core_company_name_count
from int_core__unified_user
where id_deal is not null
group by id_deal
),
ranked_core_deals as (
select
u.id_deal,
u.company_name as core_company_name,
c.core_company_name_count,
row_number() over (
partition by u.id_deal order by u.company_name
) as row_num
from int_core__unified_user u
left join core_deal_counts c on u.id_deal = c.id_deal
where u.id_deal is not null
),
deals_core as (
select id_deal, core_company_name, core_company_name_count
from ranked_core_deals
where row_num = 1
),
hubspot_deal_counts as (
select id_deal, count(distinct deal_name) as hubspot_deal_name_count
from int_hubspot__deal
where id_deal is not null
group by id_deal
),
ranked_hubspot_deals as (
select
h.id_deal,
h.deal_name as hubspot_deal_name,
c.hubspot_deal_name_count,
row_number() over (partition by h.id_deal order by h.deal_name) as row_num
from int_hubspot__deal h
left join hubspot_deal_counts c on h.id_deal = c.id_deal
where h.id_deal is not null
),
deals_hubspot as (
select id_deal, hubspot_deal_name, hubspot_deal_name_count
from ranked_hubspot_deals
where row_num = 1
),
xero_contact_counts as (
select id_deal, count(distinct contact_name) as xero_contact_name_count
from int_xero__contacts
where id_deal is not null
group by id_deal
),
ranked_xero_deals as (
select
x.id_deal,
x.contact_name as xero_contact_name,
c.xero_contact_name_count,
row_number() over (
partition by x.id_deal order by x.contact_name
) as row_num
from int_xero__contacts x
left join xero_contact_counts c on x.id_deal = c.id_deal
where x.id_deal is not null
),
deals_xero as (
select id_deal, xero_contact_name, xero_contact_name_count
from ranked_xero_deals
where row_num = 1
)
select
*,
case when dc.id_deal is not null then true else false end as is_deal_in_core,
case when dh.id_deal is not null then true else false end as is_deal_in_hubspot,
case when dx.id_deal is not null then true else false end as is_deal_in_xero
from deals_core dc
full outer join deals_hubspot dh using (id_deal)
full outer join deals_xero dx using (id_deal)

View file

@ -1638,3 +1638,69 @@ models:
in the time window. It can be null if no revenue was generated.
In some extreme cases, this ratio can be lower and higher than 0 and 1,
respectively.
- name: int_deals_consolidation
description: |
"This table contains all deal ids from different sources used in Superhog.
It contains the source (Hubspot, Xero or Core), the id_deal and the name"
columns:
- name: id_deal
data_type: character varying
description: "Unique ID for this deal."
tests:
- unique
- not_null
- name: core_company_name
data_type: character varying
description: "Company name of the deal as shown in Core."
- name: core_company_name_count
data_type: integer
description: "Count of distinct names the deal has in Core.
It might be the case that a deal has ony NULL value for a name,
so the count will be 0"
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: hubspot_deal_name
data_type: character varying
description: "Name of the deal as shown in Hubspot."
- name: hubspot_deal_name_count
data_type: integer
description: "Count of distinct names the deal has in Hubspot.
It might be the case that a deal has ony NULL value for a name,
so the count will be 0"
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: xero_contact_name
data_type: character varying
description: "Contact name of the deal as shown in Xero."
- name: xero_contact_name_count
data_type: integer
description: "Count of distinct names the deal has in Xero.
It might be the case that a deal has ony NULL value for a name,
so the count will be 0"
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: is_deal_in_core
data_type: boolean
description: "Flag to indicate if the deal is in Core."
- name: is_deal_in_hubspot
data_type: boolean
description: "Flag to indicate if the deal is in Hubspot."
- name: is_deal_in_xero
data_type: boolean
description: "Flag to indicate if the deal is in Xero."