2024-01-18 11:24:35 +01:00
|
|
|
# Name your project! Project names should contain only lowercase characters
|
|
|
|
|
# and underscores. A good package name should reflect your organization's
|
|
|
|
|
# name or the intended use of these models
|
2024-02-22 16:14:16 +01:00
|
|
|
name: "dwh_dbt"
|
|
|
|
|
version: "1.0.0"
|
2024-01-18 11:24:35 +01:00
|
|
|
config-version: 2
|
|
|
|
|
|
|
|
|
|
# This setting configures which "profile" dbt uses for this project.
|
2024-02-22 16:14:16 +01:00
|
|
|
profile: "dwh_dbt"
|
2024-01-18 11:24:35 +01:00
|
|
|
|
|
|
|
|
# These configurations specify where dbt should look for different types of files.
|
|
|
|
|
# The `model-paths` config, for example, states that models in this project can be
|
|
|
|
|
# found in the "models/" directory. You probably won't need to change these!
|
|
|
|
|
model-paths: ["models"]
|
|
|
|
|
analysis-paths: ["analyses"]
|
|
|
|
|
test-paths: ["tests"]
|
|
|
|
|
seed-paths: ["seeds"]
|
|
|
|
|
macro-paths: ["macros"]
|
|
|
|
|
snapshot-paths: ["snapshots"]
|
|
|
|
|
|
2024-02-22 16:14:16 +01:00
|
|
|
clean-targets: # directories to be removed by `dbt clean`
|
2024-01-18 11:24:35 +01:00
|
|
|
- "target"
|
|
|
|
|
- "dbt_packages"
|
|
|
|
|
|
|
|
|
|
# Configuring models
|
|
|
|
|
# Full documentation: https://docs.getdbt.com/docs/configuring-models
|
|
|
|
|
|
|
|
|
|
# In this example config, we tell dbt to build all models in the example/
|
|
|
|
|
# directory as views. These settings can be overridden in the individual model
|
|
|
|
|
# files using the `{{ config(...) }}` macro.
|
|
|
|
|
models:
|
2024-05-10 00:31:27 +02:00
|
|
|
+unlogged: true
|
2024-09-19 11:57:13 +02:00
|
|
|
# ^ This makes all the tables created by dbt be unlogged. This is a Postgres
|
|
|
|
|
# specific setting that we activate for performance. It has deep implications
|
|
|
|
|
# on how Postgres handles tables and is typically considered crazy risky, but
|
|
|
|
|
# it's very fitting for our needs. You can read more here:
|
|
|
|
|
# https://www.crunchydata.com/blog/postgresl-unlogged-tables
|
2024-01-18 11:24:35 +01:00
|
|
|
dwh_dbt:
|
2024-10-02 15:32:22 +02:00
|
|
|
+post-hook:
|
|
|
|
|
sql: "VACUUM ANALYZE {{ this }}"
|
|
|
|
|
transaction: false
|
2024-09-27 16:29:29 +02:00
|
|
|
# ^ This makes dbt run a VACUUM ANALYZE on the models after building each.
|
|
|
|
|
# It's pointless for views, but it doesn't matter because Postgres fails
|
|
|
|
|
# silently withour raising an unhandled exception.
|
2024-01-18 12:20:14 +01:00
|
|
|
staging:
|
2024-02-16 11:57:13 +01:00
|
|
|
+materialized: table
|
2024-01-18 12:20:14 +01:00
|
|
|
+schema: staging
|
2024-02-01 16:46:41 +01:00
|
|
|
intermediate:
|
2024-01-18 14:25:13 +01:00
|
|
|
+materialized: view
|
2024-02-01 16:46:41 +01:00
|
|
|
+schema: intermediate
|
2024-01-18 14:49:33 +01:00
|
|
|
reporting:
|
|
|
|
|
+materialized: table
|
2024-02-22 15:49:36 +01:00
|
|
|
+schema: reporting
|
|
|
|
|
|
2024-02-23 13:59:13 +01:00
|
|
|
seeds:
|
|
|
|
|
dwh_dbt:
|
|
|
|
|
schema: staging
|
|
|
|
|
|
2024-02-22 15:49:36 +01:00
|
|
|
vars:
|
2024-02-22 16:14:16 +01:00
|
|
|
"dbt_date:time_zone": "Europe/London"
|
2024-06-14 15:12:44 +02:00
|
|
|
# A general cutoff date for relevancy. Many models assume this to be the point
|
|
|
|
|
# in time after which they should work.
|
2024-09-27 16:28:00 +02:00
|
|
|
"start_date": "'2020-01-01'"
|
Merged PR 2164: Adding booking metrics by deal id for business kpis
This is a first approach to compute some easy metrics for the "deal" based business kpis. At this stage, it contains the information of bookings (created, checkout, cancelled) per deal and month, including both historic months as well as the current one. This do not contain MTD computation because it's overkill to do a MTD at deal level (+ we have 1k deals, so scalability can become a problem in the future)
Models:
- **int_dates_by_deal**: simple model that reads from **int_dates** and just joins it with **unified_users** to retrieve the deals. It will be used as the 'source of truth' for which deals should be considered in a given month, basically, since the first host associated to a deal is created (not necessarily booked)
- **int_core__monthly_booking_history_by_deal**: it contains the history of bookings per deal id in a monthly basis. It should be easy enough to integrate here, in the future and if needed, B2B macro segmentation.
In terms of performance, comparing the model **int_core__monthly_booking_history_by_deal** and **int_core__mtd_booking_metrics** you'll see that I removed the joined with the **int_dates_xxx** in the CTEs. This is because I want to avoid a double join of date & deal that I tried and I stopped after 5 min running. Since this computation is in a monthly basis - no MTD - it's easy enough to just apply the **int_dates_by_deal** on the last part of the query. With this approach, it runs in 7 seconds.
Related work items: #17689
2024-07-01 16:00:14 +00:00
|
|
|
|
2025-01-21 11:18:16 +00:00
|
|
|
# KPIs Start Date. This is the date from which we start calculating KPIs.
|
|
|
|
|
"kpis_start_date": "'2022-04-01'"
|
|
|
|
|
|
2025-02-11 15:13:42 +00:00
|
|
|
# New Dash First Invoicing Date. This is the first date considered for New Dash invoicing.
|
|
|
|
|
"new_dash_first_invoicing_date": "'2024-12-31'"
|
|
|
|
|
|
2025-02-07 15:20:22 +01:00
|
|
|
# A distant future date to use as a default when cutoff values are missing.
|
|
|
|
|
"end_of_time": "'2050-12-31'"
|
|
|
|
|
|
Merged PR 2164: Adding booking metrics by deal id for business kpis
This is a first approach to compute some easy metrics for the "deal" based business kpis. At this stage, it contains the information of bookings (created, checkout, cancelled) per deal and month, including both historic months as well as the current one. This do not contain MTD computation because it's overkill to do a MTD at deal level (+ we have 1k deals, so scalability can become a problem in the future)
Models:
- **int_dates_by_deal**: simple model that reads from **int_dates** and just joins it with **unified_users** to retrieve the deals. It will be used as the 'source of truth' for which deals should be considered in a given month, basically, since the first host associated to a deal is created (not necessarily booked)
- **int_core__monthly_booking_history_by_deal**: it contains the history of bookings per deal id in a monthly basis. It should be easy enough to integrate here, in the future and if needed, B2B macro segmentation.
In terms of performance, comparing the model **int_core__monthly_booking_history_by_deal** and **int_core__mtd_booking_metrics** you'll see that I removed the joined with the **int_dates_xxx** in the CTEs. This is because I want to avoid a double join of date & deal that I tried and I stopped after 5 min running. Since this computation is in a monthly basis - no MTD - it's easy enough to just apply the **int_dates_by_deal** on the last part of the query. With this approach, it runs in 7 seconds.
Related work items: #17689
2024-07-01 16:00:14 +00:00
|
|
|
# Booking state variables
|
|
|
|
|
# States should be strings in capital letters. Models need to force an upper()
|
2024-07-04 09:54:41 +00:00
|
|
|
"cancelled_booking_state": "'CANCELLED'"
|
2025-02-07 15:15:17 +01:00
|
|
|
"approved_booking_state": "'APPROVED'"
|
|
|
|
|
"flagged_booking_state": "'FLAGGED'"
|
|
|
|
|
|
2024-07-04 09:54:41 +00:00
|
|
|
# Payment state variables
|
|
|
|
|
# States should be strings in capital letters. Models need to force an upper()
|
2024-09-27 16:28:00 +02:00
|
|
|
"paid_payment_state": "'PAID'"
|
2025-02-07 15:15:17 +01:00
|
|
|
|
|
|
|
|
# Protection service state variables
|
|
|
|
|
# States should be strings in capital letters. Models need to force an upper()
|
|
|
|
|
"default_service": "'BASIC SCREENING'"
|