Merged PR 4734: Bugfix - Ensures accounts cannot be duplicated in Xero sales monthly trends

# Description

This fixes the duplication error when the same deal has more than one contact. Impact was not massive, it just affected because of the Home to Host account and this has happened during the past week. There were other cases that contained duplicates but these were not invoiced so it didn't affect the total sum.

Changes:
* Contacts now has a boolean field indicating if a deal exists.
* Contacts now has a boolean field indicating if that contact has a deal informed and it's the last updated one. This is opinionated - it could have been done with creation, for instance.
* Sales monthly trends forces the last updated contact to be true.
* Improved robustness by adding tests.

Note that modifying the logic in intermediate on `int_xero__sales_monthly_trends` by changing `id_deal` per `id_contact` would not have worked - we need to do this compute by deal to ensure that any invoicing is reported MoM and YTD.

# Checklist

- [X] The edited models and dependants run properly with production data.
- [X] The edited models are sufficiently documented.
- [X] The edited models contain PK tests, and I've ran and passed them.
- [X] I have checked for DRY opportunities with other models and docs.
- [X] I've picked the right materialization for the affected models.

# Other

- [ ] Check if a full-refresh is required after this PR is merged.

Related work items: #28576
This commit is contained in:
Oriol Roqué Paniagua 2025-03-19 08:19:20 +00:00
parent c00d02685d
commit 5928a198b6
4 changed files with 133 additions and 46 deletions

View file

@ -1,45 +1,74 @@
with stg_xero__contacts as (select * from {{ ref("stg_xero__contacts") }})
with
stg_xero__contacts as (select * from {{ ref("stg_xero__contacts") }}),
contact_with_deal as (
select
id_contact,
case
when account_number ~ '^[0-9]{10,11}$' then account_number else null
end::text as id_deal,
updated_at_utc
from stg_xero__contacts
),
deals_ranked_per_updated_at_utc as (
select
id_contact,
id_deal,
row_number() over (
partition by id_deal order by updated_at_utc desc
) as deal_rank_per_updated_at_utc
from contact_with_deal
-- Ensuring id deal is set
where id_deal is not null
)
select
id_contact,
account_number,
case when account_number ~ '^[0-9]{10,11}$' then account_number else null end::text
as id_deal,
contact_name,
is_customer,
is_supplier,
tax_number,
contact_first_name,
contact_last_name,
phones,
website,
balances,
discount,
addresses,
attachments,
email_addresses,
payment_terms,
batch_payments,
branding_theme,
contact_groups,
contact_number,
contact_status,
skyper_user_name,
contact_persons,
has_attachments,
updated_at_utc,
updated_date_utc,
xero_network_key,
default_currency_iso_4217,
validation_errors,
bank_account_details,
has_validation_errors,
tracking_category_name,
account_payable_tax_type,
tracking_category_option,
sales_default_account_code,
sales_tracking_categories,
account_receivable_tax_type,
purchases_default_account_code,
purchases_tracking_categories,
dwh_extracted_at_utc
from stg_xero__contacts
c.id_contact,
c.account_number,
cwd.id_deal,
case
when cwd.id_deal is null then true else false
end as is_contact_missing_id_deal,
-- The following ensures is the last updated contact that actually contains a deal
case
when dr.deal_rank_per_updated_at_utc = 1 then true else false
end as is_latest_updated_contact_per_deal,
c.contact_name,
c.is_customer,
c.is_supplier,
c.tax_number,
c.contact_first_name,
c.contact_last_name,
c.phones,
c.website,
c.balances,
c.discount,
c.addresses,
c.attachments,
c.email_addresses,
c.payment_terms,
c.batch_payments,
c.branding_theme,
c.contact_groups,
c.contact_number,
c.contact_status,
c.skyper_user_name,
c.contact_persons,
c.has_attachments,
c.updated_at_utc,
c.updated_date_utc,
c.xero_network_key,
c.default_currency_iso_4217,
c.validation_errors,
c.bank_account_details,
c.has_validation_errors,
c.tracking_category_name,
c.account_payable_tax_type,
c.tracking_category_option,
c.sales_default_account_code,
c.sales_tracking_categories,
c.account_receivable_tax_type,
c.purchases_default_account_code,
c.purchases_tracking_categories,
c.dwh_extracted_at_utc
from stg_xero__contacts c
left join contact_with_deal cwd on c.id_contact = cwd.id_contact
left join deals_ranked_per_updated_at_utc dr on c.id_contact = dr.id_contact

View file

@ -17,6 +17,37 @@ models:
data_tests:
- not_null
- name: int_xero__contacts
columns:
- name: id_contact
data_type: character varying
description: |
Unique identifier of a contact.
data_tests:
- not_null
- unique
- name: id_deal
data_type: character varying
description: |
Identifier of the account. It can be null.
- name: is_contact_missing_id_deal
data_type: boolean
description: |
Indicates whether id deal is null or not.
- name: is_latest_updated_contact_per_deal
data_type: boolean
description: |
Indicates whether a contact is the most recently updated one
associated with a given id_deal. When id_deal is provided, this
field helps identify the latest updated contact for that deal,
ensuring that only the most relevant contact is considered.
This is useful for deduplicating records when modeling data
per deal.
- name: int_xero__credit_notes
columns:
- name: id_credit_note
@ -348,6 +379,15 @@ models:
fiscal year is labeled by the year in which it ends (e.g., FY 2025
represents April 2024 March 2025).
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- document_issued_month_utc
- document_status
- document_type
- id_deal
- accounting_financial_l3_aggregation
columns:
- name: document_issued_month_utc
data_type: date

View file

@ -1460,6 +1460,15 @@ models:
fiscal year is labeled by the year in which it ends (e.g., FY 2025
represents April 2024 March 2025).
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- document_issued_month_utc
- document_status
- document_type
- id_deal
- accounting_financial_l3_aggregation
columns:
- name: document_issued_month_utc
data_type: date
@ -1507,6 +1516,8 @@ models:
- name: id_deal_contact_name
data_type: text
description: "A concatenation of the deal ID and contact name."
data_tests:
- not_null
- name: accounting_financial_l1_aggregation
data_type: text

View file

@ -11,7 +11,11 @@ select
smt.document_type as document_type,
smt.id_deal as id_deal,
c.contact_name as contact_name,
smt.id_deal || '-' || c.contact_name as id_deal_contact_name,
case
when c.contact_name is null
then smt.id_deal
else smt.id_deal || '-' || c.contact_name
end as id_deal_contact_name,
smt.accounting_financial_l1_aggregation as accounting_financial_l1_aggregation,
smt.accounting_financial_l2_aggregation as accounting_financial_l2_aggregation,
smt.accounting_financial_l3_aggregation as accounting_financial_l3_aggregation,
@ -22,4 +26,7 @@ select
smt.previous_year_ytd_amount_wo_taxes_in_gbp
as previous_year_ytd_amount_wo_taxes_in_gbp
from int_xero__sales_monthly_trends smt
left join int_xero__contacts c on smt.id_deal = c.id_deal
left join
int_xero__contacts c
on smt.id_deal = c.id_deal
and c.is_latest_updated_contact_per_deal = true