Merged PR 3922: Id_deal consolidation model

# Description

Intermediate model that joins all `id_deal` sources into one table.
I am using the `company_name` from core, the `deal_name` from hubspot and `contact_name` from xero as the `deal_name` for all.
There are many ids that are clearly tests or deleted

# Checklist

- [x] The edited models and dependants run properly with production data.
- [x] The edited models are sufficiently documented.
- [x] The edited models contain PK tests, and I've ran and passed them.
- [ ] I have checked for DRY opportunities with other models and docs.
- [ ] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #25827
This commit is contained in:
Joaquin Ossa 2025-01-03 13:30:48 +00:00
commit c81af72602
2 changed files with 152 additions and 0 deletions

View file

@ -0,0 +1,86 @@
with
int_xero__contacts as (select * from {{ ref("int_xero__contacts") }}),
int_core__unified_user as (select * from {{ ref("int_core__unified_user") }}),
int_hubspot__deal as (select * from {{ ref("int_hubspot__deal") }}),
core_deal_counts as (
select id_deal, count(distinct company_name) as core_company_name_count
from int_core__unified_user
where id_deal is not null
group by id_deal
),
ranked_core_deals as (
select
u.id_deal,
u.company_name as core_company_name,
c.core_company_name_count,
row_number() over (
partition by u.id_deal order by u.company_name
) as row_num
from int_core__unified_user u
left join core_deal_counts c on u.id_deal = c.id_deal
where u.id_deal is not null
),
deals_core as (
select id_deal, core_company_name, core_company_name_count
from ranked_core_deals
where row_num = 1
),
hubspot_deal_counts as (
select id_deal, count(distinct deal_name) as hubspot_deal_name_count
from int_hubspot__deal
where id_deal is not null
group by id_deal
),
ranked_hubspot_deals as (
select
h.id_deal,
h.deal_name as hubspot_deal_name,
c.hubspot_deal_name_count,
row_number() over (partition by h.id_deal order by h.deal_name) as row_num
from int_hubspot__deal h
left join hubspot_deal_counts c on h.id_deal = c.id_deal
where h.id_deal is not null
),
deals_hubspot as (
select id_deal, hubspot_deal_name, hubspot_deal_name_count
from ranked_hubspot_deals
where row_num = 1
),
xero_contact_counts as (
select id_deal, count(distinct contact_name) as xero_contact_name_count
from int_xero__contacts
where id_deal is not null
group by id_deal
),
ranked_xero_deals as (
select
x.id_deal,
x.contact_name as xero_contact_name,
c.xero_contact_name_count,
row_number() over (
partition by x.id_deal order by x.contact_name
) as row_num
from int_xero__contacts x
left join xero_contact_counts c on x.id_deal = c.id_deal
where x.id_deal is not null
),
deals_xero as (
select id_deal, xero_contact_name, xero_contact_name_count
from ranked_xero_deals
where row_num = 1
)
select
*,
case when dc.id_deal is not null then true else false end as is_deal_in_core,
case when dh.id_deal is not null then true else false end as is_deal_in_hubspot,
case when dx.id_deal is not null then true else false end as is_deal_in_xero
from deals_core dc
full outer join deals_hubspot dh using (id_deal)
full outer join deals_xero dx using (id_deal)

View file

@ -1638,3 +1638,69 @@ models:
in the time window. It can be null if no revenue was generated. in the time window. It can be null if no revenue was generated.
In some extreme cases, this ratio can be lower and higher than 0 and 1, In some extreme cases, this ratio can be lower and higher than 0 and 1,
respectively. respectively.
- name: int_deals_consolidation
description: |
"This table contains all deal ids from different sources used in Superhog.
It contains the source (Hubspot, Xero or Core), the id_deal and the name"
columns:
- name: id_deal
data_type: character varying
description: "Unique ID for this deal."
tests:
- unique
- not_null
- name: core_company_name
data_type: character varying
description: "Company name of the deal as shown in Core."
- name: core_company_name_count
data_type: integer
description: "Count of distinct names the deal has in Core.
It might be the case that a deal has ony NULL value for a name,
so the count will be 0"
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: hubspot_deal_name
data_type: character varying
description: "Name of the deal as shown in Hubspot."
- name: hubspot_deal_name_count
data_type: integer
description: "Count of distinct names the deal has in Hubspot.
It might be the case that a deal has ony NULL value for a name,
so the count will be 0"
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: xero_contact_name
data_type: character varying
description: "Contact name of the deal as shown in Xero."
- name: xero_contact_name_count
data_type: integer
description: "Count of distinct names the deal has in Xero.
It might be the case that a deal has ony NULL value for a name,
so the count will be 0"
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: false
- name: is_deal_in_core
data_type: boolean
description: "Flag to indicate if the deal is in Core."
- name: is_deal_in_hubspot
data_type: boolean
description: "Flag to indicate if the deal is in Hubspot."
- name: is_deal_in_xero
data_type: boolean
description: "Flag to indicate if the deal is in Xero."