This commit is contained in:
Pablo Martin 2025-07-11 16:15:17 +02:00
parent 729d6d6db4
commit a256b48b01
630 changed files with 16494 additions and 0 deletions

View file

@ -0,0 +1,116 @@
# (Legacy) Technical Documentation - 2024-08-05
This documentation follows a top-down approach. We start with what is visible to the users through PBI and we go backwards to the details of how things are structured and computed within DWH. This way we keep the overall image of the project before jumping into the details of it.
**Table of contents**
# Power BI Reporting
## Overview
We have a single report for Business KPIs at this stage. Its Main KPIs and its published in Business Overview. [Link to the repository here](https://guardhog.visualstudio.com/Data/_git/data-pbi-reports?path=/reports/business_overview_main_kpi).
The reporting contains 2 manners of seeing KPIs: Global KPIs and KPIs by Deal. The mapping of the KPIs per report page is the following:
- Global: MTD, Monthly Overview, Evolution over Time
- by Deal: Detail by Deal, Deal Comparison
Additionally, the reporting contains a Readme page with detailed explanation of each tab. Lastly, the report contains a Data Glossary that specifies how metrics are computed and if theres any data quality issue around some metrics.
## Data Sources
Since theres 2 ways of visualising KPIs, Global and by Deal, this report contains 2 sources. These are, in Reporting:
- Global: `mtd_aggregated_metrics`
- by Deal: `monthly_aggregated_metrics_history_by_deal`
![Untitled](Untitled%203.png)
Note the convention that follows. Both contain the `aggregated_metrics`, meaning at this stage metrics from different sources are aggregated within these 2 models. The main differences between these 2 are the fact that the KPIs by Deal are stated to be considered at `monthly_history_by_deal` level, while Global KPIs are `mtd` (month to date). This is on purpose and has consequences on how the KPIs are computed.
Lets take a look at how these models look like:
For Global KPIs, `mtd_aggregated_metrics`:
![Untitled](Untitled%2010.png)
**For each date and each metric**, we have the `value`, `previous year value` and the `relative increment` between value and previous year value. Some other fields that are important are the number format, that will impact on how the metric is formatted within Power BI and order by, that will impact on how it is ordered within the visualisation of the KPIs, specially in the MTD tab. Lastly, the dates that are displayed are either the last day of historical months OR any day of the current month, for MTD purposes.
For KPIs by deal, `monthly_aggregated_metrics_history_by_deal`
![Untitled](Untitled%2011.png)
**For each date and each id_deal**, we have only the **values of each metric in separated columns**. Note that this is not aggregated at metric level as the MTD part, and theres also not any previous year value or relative increment. This impacts on how the intermediate aggregations are handled.
# Global vs. By Deal KPIs computation
## Global KPIs schema
![Untitled](Untitled%2012.png)
## KPIs by Deal schema
![Untitled](Untitled%2013.png)
Heres the main goals of each stage, similarities and differences to be taken into account:
- **Reporting**:
- **Goal**: materialise and expose the data that is going to be available for users.
- **Similarities**
- Both flows have a table in reporting that exposes the information for PBI usage.
- **Differences**
- The by Deal part is a replica of what is available in intermediate. However, for Global is not exactly the case, since in `mtd_aggregated_metrics` we force the exclusion of Xero-based metrics for the current month and the previous one. This is to 1) avoid displaying partial invoicing data thus affecting figures such as revenue while 2) ensure within DWH all data is up-to-date, even if the invoicing cycle has not finalised. You can find the exclusion condition [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=22&lineEnd=23&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
- The naming convention, as explained before, because of how KPIs are computed and how the information is displayed in these 2 models (see Data Sources of previous paragraph)
- **Aggregation**:
- **Goal**: aggregates different sources of metrics data into a single model before exposing it.
- **Similarities**
- Both flows have a previous step in intermediate, before reporting, that contains the final computation of KPIs, namely `int_mtd_aggregated_metrics` and `monthly_aggregated_metrics_by_deal`.
- **Differences**
- The Global KPIs have two steps:
- `int_mtd_vs_previous_year_metrics`: ensures the [plain combination of the sources + the computation of derives metrics](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=27&lineEnd=28&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) AND [the computation vs. previous year by auto-joining the combined CTE](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=187&lineEnd=188&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
- `int_mtd_aggregated_metrics`: ensures the unpivot display i.e., all different metrics are aggregated into a metrics column. [Here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=1&lineEnd=2&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) we also specify the fields of the number format, order by and which name tag (metric) corresponds to each value, previous year value and relative increment.
- The KPIs by Deal have just one step:
- `int_monthly_aggregated_metrics_history_by_deal` only handles the [plain combination of the soruces + the computation of derived metrics](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_aggregated_metrics_history_by_deal.sql) on the By Deal basis.
- **Sources**:
- **Goal**: Handle all specific logic for retrieving each metric from intermediate master tables.
- **Similarities**
- All metrics depending on the same sources are encapsulated within each source model.
- All follow a strategy of logic computation within each CTE ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__mtd_guest_journey_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=26&lineEnd=27&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents), [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__mtd_guest_payments_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=17&lineEnd=18&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)) with a final aggregation of a date model with left join on the different CTEs ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__monthly_guest_payments_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=80&lineEnd=81&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)). See links for some example.
- **Differences**:
- Global models need to force a join with `int_dates_mtd` in each CTE to allow for the aggregation of the metric up to a certain day in the past, for MTD purposes. This is highly consuming in resources, thus since its not needed in the By Deal models, you dont actually need to join with the `int_dates_by_deal` in the CTEs, but only in the final aggregation.
- By Deal models need to have a Deal. This means that sometimes, since Deal is not available in a source model (ex: in Guest Journeys - verification_requests table theres no deal), theres additional joins to retrieve the id deal. This is not needed for Global models thus simplifying the computation.
- **Dates**:
- **Goal**: Provide an empty date framework that serves as the skeleton of the needed dates/granularity for each KPI type.
- **Similarities**:
- Each KPI visualisation type, Global and by Deal, have a unique dependency on a Date model.
- **Differences**:
- The `int_dates_mtd` only contains dates and allows for the MTD aggregation ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_dates_mtd.sql)) while the `int_dates_by_deal` contains the Deal aggregation - by deal suffix - while does not allow for the MTD aggregation - does not contain a mtd prefix ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_dates_by_deal.sql)).
# How to create a new metric?
Follow these steps:
1. Identify if the metric is Global, by Deal or both. Likely its both, except if youre doing some Deal-based metric by Deal that might not make sense. This will clarify if you need to modify 1 of the branches or both of them.
2. Identify the source of your metric. From here we can have different possibilities:
1. If for instance, the metric is related to Bookings, you might want to add it in the `int_core__mtd_booking_metrics` and `int_core__monthly_bookings_history_by_deal`. Similar rationality can apply for Guest Journeys, Invoicing, Guest Payments, Listings, etc.
2. If the metric “type” does not exist yet, such as implementing a Hubspot-based client onboarding opportunities metrics, ideally youd create a standalone model by replicating the structure of an already existing source model. Copy-paste and adapt 🙂
3. If your metric is a combination of two or more different sources, such as Total Revenue by Booking Cancelled, you will need to understand if the submetrics are already available or not. If yes, you can skip this part, if not, go to point a) or b). If its a derived metrics within the same source, such as Guest Journey with Payment per Guest Journey Created, you can directly add it in `int_core__mtd_guest_journey_metrics` and `int_core__monthly_guest_journey_history_by_deal`.
3. Propagate to intermediate aggregations. Lets split Global and Deal based:
1. Global KPIs:
1. Reference your newly created metric in the plain combination of sources in the `int_mtd_vs_previous_year_metrics`. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar `nullif(coalesce(x,0)+colaesce(y,0),0)` structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=110&lineEnd=111&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
2. Use the macro `calculate_safe_relative_increment` to compute the value, previous_year_value and relative_increment in the final query ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=187&lineEnd=188&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)).
2. KPIs by Deal:
1. Reference your newly created metric in the plain combination of sources in the `int_monthly_aggregated_metrics_history_by_deal`. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar `nullif(coalesce(x,0)+colaesce(y,0),0)` structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_aggregated_metrics_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=95&lineEnd=96&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
4. Exposure of metrics. Lets split Global and Deal based:
1. Global KPIs:
1. Add the configuration of your new metric in `int_mtd_aggregated_metrics`. Youll need to parametrise the order, metric (name tag that will be displayed in the reporting), the number format (for formatting in the reporting) and which values is going to use. Order by is informative so you can actually replicate an existing one, although I recommend to choose a value not being used so its clearer how we want to order the KPIs. **Important: keep in mind that merging and refreshing this will directly make this metric available and visible in the dashboard.**
2. If your metric is or uses an invoicing metric that should not be displayed in the current month or the previous month, validate that the [condition applied](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=38&lineEnd=39&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) in the reporting file of `mtd_aggregated_metrics` works well.
3. Modify Data Glossary to include the description of your new metric. Note that theres no additional need to change anything else on the Power BI for Global metrics.
2. Deal KPIs:
1. Propagate the new metric from `int_monthly_aggregated_metrics_history_by_deal` to `monthly_aggregated_metrics_history_by_deal`. If this metric is or uses an invoicing metric, please use the macro `is_date_before_previous_month`. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/monthly_aggregated_metrics_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=31&lineEnd=32&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
2. In Power BI, once the model in reporting has been refreshed, you will need to manually add the new metrics in the tabs: Detail by Deal and Deal Comparison. For each new metric, in PBI, you will need to manually specify the number format, the order of display and the name of the metric.
# Additional notes
1. Youve seen that the two ways of displaying data at this stage are not consistent - beyond the fact of having the granularity of Deal or not. It has some pros and cons and this changes the way of how to create a new metric. Global is much more DWH dependant, while By Deal needs more PBI modifications.
2. At this stage, we want to implement metrics by different dimensions, and this is actually complicated to generalise within the current setup. Were investigating a more scalable solution called MetricFlow that could potentially modify completely this structured that has been presented in this Notion page.

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,156 @@
# (Legacy) Technical Documentation - 2024-09-20
This documentation follows a top-down approach. We start with what is visible to the users through PBI and we go backwards to the details of how things are structured and computed within DWH. This way we keep the overall image of the project before jumping into the details of it.
**Table of contents**
# Power BI Reporting
## Overview
We have a single report for Business KPIs at this stage. Its Main KPIs and its published in Business Overview. [Link to the repository here](https://guardhog.visualstudio.com/Data/_git/data-pbi-reports?path=/reports/business_overview_main_kpi).
The reporting contains 2 manners of seeing KPIs: **Global KPIs** and **KPIs by Deal**. The mapping of the KPIs per report page is the following:
- **Global**: MTD, Monthly Overview, Global Evolution over Time, Detail by Category
- **by Deal**: Detail by Deal, Deal Comparison
Additionally, the reporting contains a Readme page with detailed explanation of each tab. Lastly, the report contains a Data Glossary that specifies how metrics are computed and if theres any data quality issue around some metrics.
You will notice that **Global KPIs includes Categories**. These are effectively dimensions from which we slice the data. Even though a “detail by deal” could be considered as another dimension, its considered as a separate entity since the computation is independent from the Global KPIs.
At the moment of writing this page, the list of categories are:
- Global
- By # of Listings Booked in 12 Months
- By Billing Country
## Data Sources
Since theres 2 ways of visualising KPIs, Global and by Deal, this report contains 2 sources. These are, in Reporting:
- Global: `mtd_aggregated_metrics`
- by Deal: `monthly_aggregated_metrics_history_by_deal`
![Untitled](Untitled%203.png)
Note the convention that follows. Both contain the `aggregated_metrics`, meaning at this stage metrics from different sources are aggregated within these 2 models. The main differences between these 2 are the fact that the KPIs by Deal are stated to be considered at `monthly_history_by_deal` level, while Global KPIs are `mtd` (month to date). This is on purpose and has consequences on how the KPIs are computed.
Lets take a look at how these models look like:
For Global KPIs, `mtd_aggregated_metrics`:
![image.png](image%2051.png)
**For each date, dimension, dimension_value and each metric**, we have the `value`, `previous year value` and the `relative increment` between value and previous year value. Some other fields that are important are the `number format`, that will impact on how the metric is formatted within Power BI and `order by`, that will impact on how it is ordered within the visualisation of the KPIs, specially in the MTD tab. You will also see that we have a `relative increment with sign format` that is used to apply the red to white to green conditional formatting in PBI. Lastly, the dates that are displayed are either the last day of historical months OR any day of the current month, for MTD purposes.
For KPIs by deal, `monthly_aggregated_metrics_history_by_deal`
![image.png](image%2052.png)
**For each date and each id_deal**, we have only the **values of each metric in separated columns**. Additionally, we have a few deal attributes or informative fields, such as the `main deal name`, `main billing country` and the `deal lifecycle state` on that month. Note that this is not aggregated at metric level as the MTD part, and theres also not any previous year value or relative increment. This impacts on how the intermediate aggregations are handled.
# Global vs. By Deal KPIs computation
Below you will find a simplified schema documentation. It does not include all dependencies, since its massive 🙂
It just focuses on 4 areas, from left (down) to right (top):
- Date models, which act as empty skeletons
- Source models, where all the complex logic of metric & dimension computation happens
- Aggregation models, which mainly aggregate the different source models into a unified model, with additional computations
- Reporting models, which are used to expose the data to Power BI. From these, its possible that dedicated data tests are present to ensure certain levels of data quality.
## Global KPIs schema
![image.png](image%2053.png)
## KPIs by Deal schema
![image.png](image%2054.png)
Heres the main goals of each stage, similarities and differences to be taken into account:
- **Reporting**:
- **Goal**: materialise and expose the data that is going to be available for users.
- **Similarities**
- Both flows have a table in reporting that exposes the information for PBI usage.
- **Differences**
- The by Deal part is a replica of what is available in intermediate. However, for Global is not exactly the case, since in `mtd_aggregated_metrics` we force the exclusion of Xero-based metrics for the current month and the previous one. This is to 1) avoid displaying partial invoicing data thus affecting figures such as revenue while 2) ensure within DWH all data is up-to-date, even if the invoicing cycle has not finalised. You can find the exclusion condition [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=22&lineEnd=23&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
- The naming convention, as explained before, because of how KPIs are computed and how the information is displayed in these 2 models (see Data Sources of previous paragraph)
- Additionally, two data tests depend on `mtd_aggregated_metrics`. These ensure 1) certain consistency between metric aggregation on all category values for any category distinct to global, with respect of global and 2) that the latest values of the day do not differ excessively from what was observed in previous days; this is, detecting outliers
- **Aggregation**:
- **Goal**: aggregates different sources of metrics data into a single model before exposing it.
- **Similarities**
- Both flows have a previous step in intermediate, before reporting, that contains the final computation of KPIs, namely `int_mtd_aggregated_metrics` and `monthly_aggregated_metrics_by_deal`.
- **Differences**
- The Global KPIs have two steps:
- `int_mtd_vs_previous_year_metrics`: ensures the [plain combination of the sources + the computation of derives metrics](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=%2Fmodels%2Fintermediate%2Fcross%2Fint_mtd_vs_previous_year_metrics.sql&version=GBmaster&_a=contents) AND [the computation vs. previous year by auto-joining the combined CTE](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmaster&line=235&lineEnd=236&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
- `int_mtd_aggregated_metrics`: ensures the unpivot display i.e., all different metrics are aggregated into a metrics column. [Here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_aggregated_metrics.sql&version=GBmaster&line=1&lineEnd=2&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) we also specify the fields of the number format, order by and which name tag (metric) corresponds to each value, previous year value and relative increment.
- The KPIs by Deal have just one step:
- `int_monthly_aggregated_metrics_history_by_deal` only handles the [plain combination of the sources + the computation of derived metrics](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_aggregated_metrics_history_by_deal.sql) on the By Deal basis.
- **Sources**:
- **Goal**: Handle all specific logic for retrieving each metric from intermediate master tables.
- **Similarities**
- All metrics depending on the same sources are encapsulated within each source model.
- All follow a strategy of logic computation within each CTE ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__mtd_guest_payments_metrics.sql&version=GBmaster&line=29&lineEnd=30&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)) with a final aggregation of a date model with left join on the different CTEs ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__monthly_guest_payments_history_by_deal.sql&version=GBmaster&line=80&lineEnd=81&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)). See links for some example.
- **Differences**:
- Global models have jinja code that loop across the set of categories specified in the macro `get_kpi_dimensions` in [business_kpis_configuration](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/macros/business_kpis_configuration.sql). For each dimension or category, different joins and where conditions can apply. In contrast, By Deal models are far more simple since the id_deal is the only granularity.
- Global models need to force a join with `int_dates_mtd` in each CTE to allow for the aggregation of the metric up to a certain day in the past, for MTD purposes. This is highly consuming in resources, thus since its not needed in the By Deal models, you dont actually need to join with the `int_dates_by_deal` in the CTEs, but only in the final aggregation.
- By Deal models need to have a Deal. This means that sometimes, since Deal is not available in a source model (ex: in Guest Journeys - verification_requests table theres no deal), theres additional joins to retrieve the id deal. This is not needed for some categories on Global models, thus logic might differ.
- Booking metrics are split between 4 different models in the Global view, while its just one model in the By Deal view. This is because of an exercise of performance optimisation - yes, categorising is expensive
- **Dates**:
- **Goal**: Provide an empty date framework that serves as the skeleton of the needed dates/granularity for each KPI type.
- **Similarities**:
- Each KPI visualisation type, Global and by Deal, have a unique dependency on a Date model.
- **Differences**:
- The `int_dates_mtd_by_category` contains dates, category and category value and allows for the MTD aggregation ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_dates_mtd_by_dimension.sql)) while the `int_dates_by_deal` contains the Deal aggregation - by deal suffix - while does not allow for the MTD aggregation - does not contain a mtd prefix ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_dates_by_deal.sql)).
# How to create a new metric?
Follow these steps:
1. Identify if the metric is Global, by Deal or both. Likely its both, except if youre doing some Deal-based metric by Deal that might not make sense. This will clarify if you need to modify 1 of the branches or both of them.
2. Identify the source of your metric. From here we can have different possibilities:
1. If for instance, the metric is related to the Guest Journey, you might want to add it in the `int_core__mtd_guest_journey_metrics` and `int_core__monthly_guest_journey_history_by_deal`. Similar rationality can apply for Bookings, Invoicing, Guest Payments, Listings, etc.
2. If the metric “type” does not exist yet, such as implementing a Hubspot-based client onboarding opportunities metrics, ideally youd create a standalone model by replicating the structure of an already existing source model. Copy-paste and adapt 🙂
3. If your metric is a combination of two or more different sources, such as Total Revenue by Booking Cancelled, you will need to understand if the submetrics are already available or not. If yes, you can skip this part, if not, go to point a) or b). If its a derived metrics within the same source, such as Guest Journey with Payment per Guest Journey Created, you can directly add it in `int_core__mtd_guest_journey_metrics` and `int_core__monthly_guest_journey_history_by_deal`.
3. Propagate to intermediate aggregations. Lets split Global and Deal based:
1. Global KPIs:
1. Reference your newly created metric in the plain combination of sources in the `int_mtd_vs_previous_year_metrics`. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar `nullif(coalesce(x,0)+colaesce(y,0),0)` structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=110&lineEnd=111&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
2. Use the macro `calculate_safe_relative_increment` to compute the value, previous_year_value and relative_increment in the final query ([here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_mtd_vs_previous_year_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=187&lineEnd=188&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents)).
2. KPIs by Deal:
1. Reference your newly created metric in the plain combination of sources in the `int_monthly_aggregated_metrics_history_by_deal`. If you need to do a combination with multiple metrics from different sources, this is the place to go. Keep in mind to apply similar `nullif(coalesce(x,0)+colaesce(y,0),0)` structures for combined metrics to ensure that metrics get combined if theres null but theres no division by zero error at the final aggregation 🙂. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_aggregated_metrics_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=95&lineEnd=96&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
4. Exposure of metrics. Lets split Global and Deal based:
1. Global KPIs:
1. Add the configuration of your new metric in `int_mtd_aggregated_metrics`. Youll need to parametrise the order, metric (name tag that will be displayed in the reporting), the number format (for formatting in the reporting) and which values is going to use. Order by is informative so you can actually replicate an existing one, although I recommend to choose a value not being used so its clearer how we want to order the KPIs. **Important: keep in mind that merging and refreshing this will directly make this metric available and visible in the dashboard.**
2. If your metric is or uses an invoicing metric that should not be displayed in the current month or the previous month, validate that the [condition applied](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/mtd_aggregated_metrics.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=38&lineEnd=39&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents) in the reporting file of `mtd_aggregated_metrics` works well.
3. Modify Data Glossary to include the description of your new metric. Note that theres no additional need to change anything else on the Power BI for Global metrics.
2. Deal KPIs:
1. Propagate the new metric from `int_monthly_aggregated_metrics_history_by_deal` to `monthly_aggregated_metrics_history_by_deal`. If this metric is or uses an invoicing metric, please use the macro `is_date_before_previous_month`. Example [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/reporting/general/monthly_aggregated_metrics_history_by_deal.sql&version=GBmodels/19382_dbt_metricflow_exploration&line=31&lineEnd=32&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents).
2. In Power BI, once the model in reporting has been refreshed, you will need to manually add the new metrics in the tabs: Detail by Deal and Deal Comparison. For each new metric, in PBI, you will need to manually specify the number format, the order of display and the name of the metric.
# Additional notes
1. Youve seen that the two ways of displaying data at this stage are not consistent - beyond the fact of having the granularity of Deal or not. It has some pros and cons and this changes the way of how to create a new metric. Global is much more DWH dependant, while By Deal needs more PBI modifications.
2. At this stage, with the capacity to compute metrics at different dimensions, were starting to see some performance issues. This could highly increase the more dimensions we add. The increase on number of metrics could also affect, but on a much lower rate. This could open up for refactoring such as:
1. Daily based pre-aggregated semantic models at the deepest granularity, incrementally updated. This could look like:
1. Time
1. Date
2. Dimension (with some potential examples)
1. Deal ID
2. Billing Country
3. Listing Country
4. Booking Source
5. Customer Segmentation
6. etc.
3. Metric (daily)
1. Created Bookings
2. Checkout Bookings
3. etc.
in combination with,
2. A fully refreshed upper layer that aggregates the different metrics by looping per dimension, that applies MTD computation and handles converted metrics and other nuances
Likely this set up is more prone to scalability, besides the fact that could integrate the current Global and By Deal views into just a single computation

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,106 @@
# 2024-07-26 - Glad youre back, Pablo
Things that happened when you were off that might require your attention
# Xexe incident on July 18th
All details here: [20240718-01 - Xe.com data not retrieved](20240718-01%20-%20Xe%20com%20data%20not%20retrieved%205c283e9aa4834323b38af0bff95477a5.md)
# Revenue figures issues
Revenue figures are not fully consistent between Data (DWH) side and Finance. Xero-based reporting is generally ok, with some small discrepancies that have minimal impacts and can partially be explained. However, a massive discrepancy on Guest Revenue (Waivers, Deposit Fees, Guest Products) is detected since Data is reporting it with taxes included, while Finance seems to be reporting it without taxes. This generates discrepancies within Data reporting, in the sense that Xero-based reporting/metrics are usually tax exclusive. The issue is communicated on 24th July to all users
[Data quality assessment: DWH vs. Finance revenue figures](Data%20quality%20assessment%20DWH%20vs%20Finance%20revenue%20fig%206e3d6b75cdd4463687de899da8aab6fb.md)
# Grand Welcome invoicing
GW is a franchise. They have 80 different accounts owned by their individual franchisees but they want them all to be billed as one.
On 15th July, it was discussed with Finance (Suzannah and Jamie), Clay, Leo and Uri. Main idea would be to have the invoicing export by deal id. Thus meaning, these 80 franchisees would be linked to a single Deal Id. This might have an impact on the invoicing reporting that I (Uri) am not really aware off, thus no estimation on impact/how much time will it take has been provided. Theres going to be a follow up on this subject by beginning August.
Some other subjects:
- This is the famous account of the 9k duplicated bookings in March. In order to repay them for these problems, they have 2 free of charge months (thus gives us a bit of extra time). **Note: the subject of duplicated listings was re-opened by Clay on Monday 22nd**
- Because of new pricing, they want to change from a listing-based charging type to a booking-based charging type. This is still another discussion that Leo/Clay need to have with the client, but of course this could impact the way its invoiced.
- Potentially, it could be interesting to create somehow a “super user” that would be able to see the Dashboard for many “users” assigned to them. This was open discussion, not commitment.
# eDeposit and Athena migration
Ana wrote to me the day you started holidays. Apparently, theres a CosmosDB migration that they wanted to do this sprint. In the refinement session it was discovered that this could impact the existing reporting we have on CosmosDB. Long story short, it seems the schema wont change and its just the URL that changes.
This might have an impact depending on how were retrieving the information on CosmosDB. We re-loop Ben R, and after discussion with Ray, seems ok to move forward since it would be a minimal impact on PBI reporting.
Pending update. Information available in #api-data channel.
# Data Priorities check (15th July)
Only Suzannah and Uri were here on this edition. Topics discussed:
- Finance top prios:
- Minimum listing fees subject
- Very important - Check in Hero will be rolled-out (offered) to all hosts that are interacting with Guests on Guest Journeys. Check in Hero will have a commission share with hosts, meaning that at the beginning of the new month, wed need to pay back part of the Check in Hero revenue. Suzannah to send an e-mail with the details (she did not send the e-mail, but Ben C actually asked me Uri and Ben R on feasibilities - I said that this will take quite a bit of time and effort)
- KPIs / Business Overview
- Wed need to do an exercise on revenue comparison between Business Overview and Finance reports. It seems there are some discrepancies. A potential explanation could be the currency exchange rates (for historical finance figures on guest payments vs. the ones reported now). **See point 2 - Revenue figures :) :) :)**
- Suzannah noticed (and I noticed as well) that a snapshot made on day D of a previous day Z can display different data on day D+X on the same Z day. I guess theres some past update happening on the database that since were fully refreshing the KPIs is being missed. We need to investigate this. Partially investigated with revenues investigation
- Provide a possibility to chart metrics in the Main KPIs dashboard (done).
- They would like to see the Host split per Client type (1-10 listings, PMs 11-100 listings, Enterprise 100+ listings), Geography (mostly Country, to be discussed: if Im a host located in England, but I have a Listing in the US, which one should I consider? B2B or B2C?). This has been discussed in the KPIs sessions, the details and recording being here: [https://www.notion.so/knowyourguest-superhog/Business-KPIs-Definition-III-TMT-session-24th-July-2024-1bd5435844ac432f9161b1ccf4c4d062](https://www.notion.so/Business-KPIs-Definition-III-TMT-session-24th-July-2024-1bd5435844ac432f9161b1ccf4c4d062?pvs=21)
- Other
- We need to provide access to all Finance to the Account Report (done)
# Product visibility - Data visibility
Product has been working on creating general guidelines to present roadmaps and initiatives to the different business teams. After checking with Ben, were also supposed to do it.
Now, since Data is a bit special, Lou A has helped on determining what should we apply and what not. In a nutshell:
- We need to discuss with Ben and Suzannah on priorities again because (as you see in this list) a lot of things happened in just 2 weeks. With this we can adapt the roadmap
- We should adapt each item in the roadmap, ideally filling a bit more the description. The description template could be useful for us as well.
- Lou A showed me the resolution roadmap she has with bigger timelines / not fully specified over time. I think this is a nice way to say “hey guys, we will do these during Q3, but I dont commit to do this in a given week”. Might be interesting to apply a similar strategy for Data
- Record a 2 minute video explaining how to interpretate the Data roadmap, but not the contents of it
- Have a session with business teams, but open to everyone, explaining a bit more the details of what we aim to do in Q3 or in the future. This should be recorded. It can be just a matter of 15 min prez + questions
# Billing automation
Within the new dashboard initiative, theres the goal to do an automatic billing. We did a first kickoff on the discovery phase with Product (Dagmara - leading initiative, Lou D), Finance (Suzannah, Nathan, Jamie) and Tech (Ben R., Gus). Dagmara will need support from Data side to ensure that we can list the different data points used, the current process and so on.
Dagmara has created a very nice summary with the steps that will follow: [https://www.notion.so/knowyourguest-superhog/Discovery-Plan-New-Dashboard-V3-Automated-Billing-940eb16d61684a4b9d2fca1001a127ea](https://www.notion.so/Discovery-Plan-New-Dashboard-V4-Automated-Billing-940eb16d61684a4b9d2fca1001a127ea?pvs=21)
Theres also a slack channel #proj-automated-billing
# Billable bookings
While working on billable bookings, I started taking a look at the data-invoicing-exporter project. Theres a couple of differences that we might need to discuss, specially on the fact that charges that happen when the verification starts now uses a different logic in the data-invoicing-exporter (guest user joined date) vs. DWH in booking_charge_events (guest used link date, the estimated start date).
All details here:
[Data quality assessment: Billable Bookings](Data%20quality%20assessment%20Billable%20Bookings%2097008b7f1cbb4beb98295a22528acd03.md)
# Booking source field
Based on a Joan request (and actually something that interested as well Lou D, and probably other people), weve developed a new Booking source field that has been propagated within DWH. Might be worth that you do a double check just to verify all is ok.
# Data incoherence on guest choices
Based on a request from Lou A where they want to know how many guests choose no cover over other payment options presented to them. Using a query provided by Lawrence E:
**`select** *,`
**`CASE** **WHEN** DisabledValidationOptions & 1 > 0 **THEN** 0 **ELSE** 1 **END** **AS** *"Fee(1)"*,`
**`CASE** **WHEN** DisabledValidationOptions & 2 > 0 **THEN** 0 **ELSE** 1 **END** **AS** *"Membership(2)"*,`
**`CASE** **WHEN** DisabledValidationOptions & 4 > 0 **THEN** 0 **ELSE** 1 **END** **AS** *"FeeWithDeposit(4)"*,`
**`CASE** **WHEN** DisabledValidationOptions & 8 > 0 **THEN** 0 **ELSE** 1 **END** **AS** *"Waiver(8)"*,`
**`CASE** **WHEN** DisabledValidationOptions & 16 > 0 **THEN** 0 **ELSE** 1 **END** **AS** *"NoCover(16)"*`
**`from** live.dbo.PaymentValidationSetToCurrency`
We found that there are some cases were its not making sense between the offered choices, according to this data, and the chosen ones by the guests. We have already bring this data incoherence up with Lawrence and he is currently working on trying to find the problem and hopefully soon have a solution.
# Screening API Report ready for deployment
We have the new report for Screening API ready to go as soon as it is needed
[Link](https://app.powerbi.com/groups/me/apps/043c0aec-20b8-4318-9751-f7164b3634ad/reports/c69e3d40-a669-4dc3-899e-dbc84a0c6c24/ReportSectionbd92a560d1aa856ba993?ctid=862842df-2998-4826-bea9-b726bc01d3a7&experience=power-bi)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,5 @@
# 2024-08-20 - Glad youre back, Uri
- A peculiar PR that reduced the execution time of `int_core__mtd_booking_metrics` from 1100 seconds to 10 seconds.
- [https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2774](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2774)
- I started out a list of interesting tools: [Cool tools](Cool%20tools%20afdf8f69b4b0498aaee66ad1a520cc0d.md)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,84 @@
# 2024-10-02 - Integrating New Dashboard & New Pricing into DWH
List of Core tables linked to New Pricing (NP) / New Dashboard (ND)
| Table Name | Description | Main fields | Status | DWH usages | Uris comments |
| --- | --- | --- | --- | --- | --- |
| Claim | Not exclusively for NP/ND, but used to know which users have been switched in different stages from Old Dashboard to New Dashboard | - UserId
- ClaimType
- ClaimValue | Fully integrated, might need updates | We apply a macro based on the content of this table:
- [user_migration_configuration](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/macros/user_migration_configuration.sql)
This is later used in the [user_migration](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__user_migration.sql) model to identify migrated users from old to new dash. This is later added into the main table of [user_host](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__user_host.sql) | Likely well need updates as new versions of the dashboard are launched. Should be covered already for MVP and V2.
Not clear how to tag “new dash users” for those cases that the user gets directly created into the new dash (instead of switched). To be confirmed
- Quick discussion with Daga: what if we take 1) Claim to know which user is in New Dash and 2) Claim to know which user has been switched and when. Then 1)-2) is new users in new dash, and the creation date of the user is the new “start date”. |
| ProductBundle | Basic information of a product bundle | - Id (ProductBundleId)
- ProtectionPlanId | Not integrated, will not integrate | | Not needed for the moment since UserProductBundle already contains denormalised information of the product bundle (ex: name, protection plan id) |
| ProductBundleDescription | Description of the product bundle | - ProductBundleId | Not integrated, will not integrate | | Not needed for the moment since it only contains an explanation of what the product bundle means for client point of view |
| UserProductBundle | Its the main table: it states that the user has, or has had, the capacity to apply product bundles into a listing. This does not mean however that these are/were actually applied. A bundle contains one or more product services and has a certain protection plan. | - Id (UserProductBundleId)
- SuperhogUserId
- ProductBundleId
- ProtectionPlanId
- ChosenProductServices
- StartDate
- EndDate | Fully integrated, might need updates | Main usage in the model:
- [user_product_bundle](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__user_product_bundle.sql) | Not all users in this table are in New Dashboard. Thus, specifically for New Dash reporting, we force users to exist in the user_migration model. Also, we create an effective start date of a product bundle so this start date is not before the user switched from old to new dash. |
| AccommodationToProductBundle | Another main table: it states that a listing has, or has had, a product bundled applied; thus affecting the bookings of that listing with the specific product bundle. | - Id (AccommodationToProductBundleId)
- UserProductBundleId
- AccommodationId
- StartDate
- EndDate | Fully integrated, no need to update | Main usage in the model:
- [accommodation_to_product_bundle](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__accommodation_to_product_bundle.sql) | Important - this table will NOT contain basic screening product bundle. This is because this bundle is by default. Thus only product bundles different to the basic screening can be applied. Also, we create an effective start date of a product bundle so this start date is not before the user configured the bundle (see UserProductBundle comment) |
| BookingToProductBundle | States that a booking has had a product bundle (well, a user product bundle) applied. Thus this can be used to know the product and protection services that were offered (not necessarily those that finally applied). | - Id
- UserProductBundleId
- BookingId | Fully integrated, no need to update | Main usage in the model:
- [booking_to_product_bundle](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/core/int_core__booking_to_product_bundle.sql&version=GBmaster&_a=contents) | Unsure of why we have StartDate and EndDate in this table. Not using it 😀
We also enforce that the user needs to have had the product bundle configured before the booking was created (effectively meaning that we exclude bookings from migrated users that were created before the migration date). |
| ProductService | Basic information of the product service | - Id (ProductServiceId)
- ProductServiceFlag | Integrated in staging, needs further modelisation | Integrated into staging:
- [product_service](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/staging/core/stg_core__product_service.sql) | |
| ProductServiceToPrice | Basic information of product services and their prices. It states that a given product service will have a certain price for a given currency, with a given price base unit, with a certain invoicing trigger and a specific payment type. Additionally, it will state if this price is a default one or a dedicated one for a given user, in case UserProductBundleId is set. | - Id (ProductServiceToPriceId)
- ProductServiceId
- CurrencyId
- UserProductBundleId
- BillingMethodId
- InvoicingMethodId
- PaymentTypeId
- StartDate
- EndDate
- Amount | Integrated in staging, needs further modelisation | Integrated into staging:
- [product_service_to_price](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/staging/core/stg_core__product_service_to_price.sql) | We do directly the denormalisation of the attributes BillingMethodId, InvoicingMethodId and PaymentTypeId at the staging layer. Also, theres the following rename:
- BillingMethod = price_base_unit
- InvoicingMethod = invoicing_trigger
- PaymentType remains the same |
| BillingMethod | Whether the price of the ProductServiceToPrice is at Booking level or per number of nights | - Id (BillingMethodId) | Fully integrated, no need to update | Integrated directly into the ProductServiceToPrice staging layer as price_base_unit | |
| InvoicingMethod | When the service needs to be invoiced, at which moment of time | - Id (InvoicingMethodId) | Fully integrated, no need to update | Integrated directly into the ProductServiceToPrice staging layer as invoicing_trigger | |
| PaymentType | Whether the price is stated as an Amount or as Percentage | - Id (PaymentTypeId) | Fully integrated, no need to update | Integrated directly into the ProductServiceToPrice staging layer as payment_type | |
| Protection | Basic information of the Protection Services | - Id (ProtectionId)
- RequiredProductServices | Integrated in staging, needs further modelisation | | Seems a 1 to 1 relation with ProtectionPlan. If so, Ill just add everything into a single protection_plan table |
| ProtectionPlan | Historification in case theres changes on any Protection | - Id (ProtectionPlanId)
- ProtectionId
- StartDate
- EndDate | Integrated in staging, needs further modelisation | | |
| ProtectionPlanToPrice | Similar contents as ProductServiceToPrice, but for Protection. In essence, how much it costs to have a dedicated protection (for a given currency, price base unit, invoicing trigger, payment type). Also if its a default price or a dedicated one for a given user, in case UserProductBundleId is set | - Id (ProtectionPlanToPriceId)
- ProtectionPlanId
- CurrencyId
- UserProductBundleId
- BillingMethodId
- InvoicingMethodId
- PaymentTypeId
- StartDate
- EndDate
- Amount | Integrated in staging, needs further modelisation | | We should follow a similar strategy as for ProductServiceToPrice
Maybe rename internally as ProtectionServiceToPrice? Avoid confusion with ProtectionPlanToCurrency |
| AppliedProductService | Key table to know “this Booking has these Product Services applied”. Currently WIP in backend side, necessary for Revenue computation and Service usage | - TBD | To be integrated, waiting for backend | TBD | We asked for additional id fields so we can link the information with other main tables easily. Also, see if we can follow an insert only approach to keep the history. |
| AppliedProtectionService | Similar as AppliedProductService but for Protection. Does not exist yet | - TBD | To be integrated, waiting for backend | TBD | We asked to have this table created so we can have a similar strategy as we will do for AppliedProductService |
| ProtectionPlanToCurrency | How much we protect per Protection Service and Currency. This contains protections itself, rather than prices for the protections, thus can wait for later. | - Id (ProtectionPlanToCurrencyId)
- ProtectionPlanId | Integrated in staging, needs further modelisation | | I think I will add a different name to avoid confusions with ProtectionPlanToPrice. |
| DepositManagement | | | ? | | Deposit management is the nomenclature used for Waivers/Deposits services, but these exist in ProductServices and Im not sure the following tables have a direct link or are strictly needed for our reporting. |
| DepositManagementItem | | | ? | | |
| DepositManagementItemToProtection | | | ? | | |
| DepositManagementItemToProtectionAmount | | | ? | | |
| | | | | | |

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,45 @@
# 2024-10-03 - Glad youre back, Ben
Data team is very happy that youre back Ben! Find below a list with links towards the main aspects that might be relevant for you regarding the Data scope in these past 2 months.
> **Table of contents**
>
# Q3 recap
Probably the easiest to keep an eye on what happened lately its just checking our Q3 Achievements page, in which we summarise both the status of the objectives we set at the beginning of the quarter as well as new lines of work that popped up.
[Q3 Data Achievements ](Q3%20Data%20Achievements%201130446ff9c9800e84e4f03750b752a1.md)
As usual, the Data OKRs are available in our dedicated Notion page:
[Data OKRs](https://www.notion.so/299e4da6e92043899646d11609c051ae?pvs=21)
For more in-depth, we also suggest checking the latest entries to the Data News:
[Data News](https://www.notion.so/Data-News-7dc6ee1465974e17b0898b41a353b461?pvs=21)
# Q4 planning
We did an exercise with Ana and the TMT to plan for Q4. The draft for Q4 OKRs is available in the dedicated Notion page:
[Data OKRs](https://www.notion.so/299e4da6e92043899646d11609c051ae?pvs=21)
In order to provide a bit more overview of what each initiative implies, theres a more verbose page that we shared with the TMT for our quarterly planning meeting:
[Q4 Data Scopes proposal](Q4%20Data%20Scopes%20proposal%2075bf38ab8092471d910840ab86b0ec60.md)
At the moment of writing this Notion page we still have pending an update on the roadmap for the Q4.
# Recap of Power BI reports
This is just an extensive list of all available Power BI apps, including existing ones + new ones. Let us know if youre missing access to some of these.
- [Business Overview](https://app.powerbi.com/groups/me/apps/33e55130-3a65-4fe8-86f2-11979fb2258a/reports/5ceb1ad4-5b87-470b-806d-59ea0b8f2661/ReportSectionddc493aece54c925670a?experience=power-bi) (contains Revenue reports and Main KPIs)
- [Check-in Hero](https://app.powerbi.com/groups/me/apps/14859ed7-b135-431e-b0a6-229961c10c68/reports/8e88ea63-1874-47d9-abce-dfcfcea76bda/ReportSectionddc493aece54c925670a?experience=power-bi)
- [Currency Exchange](https://app.powerbi.com/groups/me/apps/10c41ce2-3ca8-4499-a42c-8321a3dce94b/reports/fcfd0a77-6c2a-4379-89be-aa0b090265d7/64ddecd28ca50dc3f029?experience=power-bi)
- [Superhog reporting (legacy)](https://app.powerbi.com/groups/me/apps/86bd5a07-0cd9-40ab-9e97-71816e3467e8/reports/fe54c090-ae85-4cfd-9f28-3d31ab486bc3/ReportSectiond82bb2cfdd980be42da5?experience=power-bi)
- [Guests Insights](https://app.powerbi.com/groups/me/apps/2464d25c-056c-4b94-9a7f-26b72c7fde33/reports/b6ff2cf4-5abb-4c1b-9341-b6f2dae04900/2f768051ca6abb70b39a?experience=power-bi) (contains Guest satisfaction CSAT score for the moment)
- [Accounting](https://app.powerbi.com/groups/me/apps/4a019abb-880f-4184-adc9-440ebd950e00/reports/86abbd2f-bfa5-4a51-adf5-4c7a3be9de07/b992edecc5478e506a75?experience=power-bi) (contains Host Resolutions + Invoicing and Crediting)
- [API Reports](https://app.powerbi.com/groups/me/apps/043c0aec-20b8-4318-9751-f7164b3634ad/reports/c69e3d40-a669-4dc3-899e-dbc84a0c6c24/ReportSectionbd92a560d1aa856ba993?experience=power-bi) (contains Screening API and E-Deposit Invoice)
- [New Dashboard Reporting](https://app.powerbi.com/groups/me/apps/7197c833-dbf9-4d2c-bca1-95f74aec4b11/reports/f0bad5b7-d9d2-45ba-a3cb-d190dd91b493/1bbfbee419e040409b95?experience=power-bi) (contains User adoption of New Dash, currently MVP)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,61 @@
# 2024-10-24 - Glad youre back, Joaquin
Pablo and Uri hope you had an amazing holidays. Some things happened when you were not here, so heres a summary!
# Domain Analysts programme has started
We had the first session with Jamie and Alex and explained a bit what we aim to achieve during this Q4 - as well as given them some SQL homework to do!
Theres a new slack channel named #analyst-guild in which we can discuss directly with them and you will find more relevant information in there. Check this [Notion page](https://www.notion.so/Q4-Training-and-Onboarding-Plan-1210446ff9c980cb9eb1c2e1895c0f46?pvs=21) to learn more.
# Athena claims analysis
Pablo did some not planned work yet very critical for Athena. Apparently, it was assumed that Athena was a good source of cash for us, but it seems the amount paid out for claims is huge. After further checks, it seems that the majority of critical claims come from just a few claimants, and thus a re-negotiation has been started by key people of the company. A very good example of why we need Data!
# E-deposit migration was a great success
After some weeks preparing the migration with API squad, now we have two independent flows to feed E-deposit vs. Athena. Everything went according to plan. This effectively means that the current status looks like this:
E-deposit:
![image.png](image%2058.png)
Athena:
![image.png](image%2059.png)
# Hubspot deal data is integrated and being used
We focused on integrating Deal data as soon as possible as we had some max priority needs for Account Managers reporting and Churn definition. Among the different Hubspot entities, we focused first on Deal. This data is already being used in KPIs and new models, as can be seen here:
![image.png](image%2060.png)
Well discuss on whats next for the remaining entities, but so far, this has proven to be enough and already very valuable, as you can see in following entries.
# Churn definition
A big subject has been to define Revenue, Listing and Booking Churn Rates. We did this exercise with Suzannah, Matt and Alex.
In short, we assume Revenue, Listing and Booking Churn to be coming from accounts that are churning. In other words, from Deals being in a Churning state (which can only be in 1 month before becoming Inactive).
First things first, we improved the logic for when were considering a Deal to be Churning. We keep either the already existing definition (i.e., a Deal is Churning if the last booking created was exactly 13 months ago) OR a Deal has offboarded in a given month. This offboarding information comes from Hubspot, from the cancellation date attribute. This is one of the changes that can be seen in the previous screenshot, in which int_mtd_deal_lifecycle now has a dependency with Hubspot deals. You might notice as well that this model is no longer in Core, but in Cross since it has both Hubspot and Core dependencies.
Second thing - we need to have a proper computation of Revenue. If you remember, before Revenue was deducting the amount that we were paying to hosts in terms of waivers. This is not the case anymore, meaning total revenue figures are closer to the Finance definition (and bigger than before). This has been already deployed for a couple of weeks.
Also weve created new contribution models that allow us to know the % of Revenue, Listings Booked in Month and Created Bookings each Deal has in a 12 month window. This is a bit more complex since were not doing an Additive approach but rather an Average one, because of business needs in the definition itself. Its a bit complex so I encourage you to check the model implementation if youre interested [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_12m_window_contribution_by_deal.sql). This “by deal monthly” computation is then used to compute the Main KPIs, meaning that now we have a strict dependency on Global KPIs depending on Monthly KPIs by Deal. With this we have a final model that computes the Churn contribution [here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_churn_metrics.sql).
These 3 Churn Rates are already deployed since Tuesday 22nd and available in Main KPIs.
# Churn prevention → top losers → Account managers reporting
Another piece of work related to churn. In this case its not focusing on measuring churn, but rather, providing indicators for each account in terms of “growth” and “impact” so Account Managers and RevOps generally speaking can smartly dedicate effort towards where its really needed. This, if actioned by AM, should reduce Churn (thus why its churn prevention).
Long story short, we have a [new report here](https://app.powerbi.com/groups/me/apps/bb1a782f-cccc-4427-ab1a-efc207d49b62/reports/797e7838-3119-4d0e-ace5-2026ec7b8c0e/cabe954bba6d285c576f?experience=power-bi). Originally it was called top losers (because we categorised accounts as top losers, losers, winners, etc) but now has grown a bit in scope so its just Account Managers Overview. This report gathers all accounts by deal and each month evaluates the growth and the impact this growth has upon the overall business. Aaaaand with this we just categorise accounts in 5 groups. Id recommend to check the readme since its quite detailed, or, if you prefer 423 lines of SQL code, you can check [the model here](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project?path=/models/intermediate/cross/int_monthly_growth_score_by_deal.sql).
Lastly, weve recently integrated some Hubspot information of each Deals so Account Managers and decision-makers have greater detail. For instance, were able to detect accounts that have not churned yet and still these are active, thus potentially actionable on AM side.
Its extremely useful to explain increases in Churn Rates in specific months - Ill let you check August 2024 peak and get your own conclusions 🙂
# General update
Well discuss talk about this in the first meeting

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,48 @@
# 20240611 Retro
## 🙌 What went well
- Priorities and capacity
- Data team has increased in capacity
- TMT has a lot more visibility and alignment with us
- Weve done a good job at structuring demand and ****keeping pushy stakeholders at bay
- Adhoc/Data Captain deliveries have been flowing orders of magnitude better than in the Pablo-era
- Team organization is working well
- Internal collaboration is quite smooth so far (were good people 🙂)
- VERY GOOD documentation in Notion, repositories, etc. on the Data stack
- The way we are organizing the team and distributing the responsibilities, I really appreciate the tools we are using like the board that makes very easy to keep track of everyone's assignments.
- I think the dailies are also very helpful to stay in contact and updated to what everyone is doing in their day to day work
- I feel very comfortable with the team and the disposition of everyone to be very helpful inside and outside the office.
## 🌱 What needs improvement
- Development workflow in dbt / PBI could be more agile and frictionless
- Stakeholder visibility/relationship with other teams
- Clear lack of data exposed / reported to business and product teams
- Backlog of engineering dependencies/topics is messy and we drop balls
- Data priorities and tempos visibility to the rest of the business
- Data modelisation problems from the source (ex: guest journey end date needs tons of logic because it was not properly implemented, expected revenue figures, sources of Hosts, etc)
- To have a documentation of all the data we can work with, maybe of the source tables.
## 💡 Ideas for what to do differently
- Tooling
- More hands-on development onboarding for Data
- A bit complicated to review PBI reports - ensure these are exposed in our workspace for Data team reviews?
- More connection with the engineering team
- Start running dbt tests in production
- Reduce bus factor for Data Engineering
## ✔ Action items
- [ ] Formalize further the relationship between Data <> Engineering and dependencies
- [ ] Backend documentation and know-how Productboard item
- [ ] Simplify dumping of prd data to local environment
- [ ] Add to backlog the creation of onboarding-hello-world-challenges
- [ ] Discuss further hands in Engineering with Ben C.
- [ ] Add `dbt test` to dbt run script
- [x] Kidnap staging workspace to make delivery
543057
543057

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,104 @@
# 20240619-01 - CheckIn Cover multi-price problem
This page is to track a production bug spotted on 2024-06-19, 12:30 ES time.
The problem was solved on 2024-06-19, 17:30 ES time.
## Executive Summary
- I (Pablo) believe some pieces of data were manually modified in an unproper way in the Core Superhog database due to some tech debt and user mistakes we are living with.
- This propagated into data quality issues in the DWH, which eventually led to wrong reporting in the Checkin Hero reporting and the Business Overview reporting suites, including inflated revenue numbers.
- This specific problem will be fixed by the Data team with some engineering work, but we need to run a postmortem on how it happened and change our way of doing things to avoid **massive business problems in the future**. **Im calling all of us to serious action to avoid more of this in the future.**
## Initial problem
- Pablo spotted a duplicate record in the DWH table `reporting_core__vr_check_in_cover` while running some data tests on the DWH. This table summarizes some details around guest journeys with checkin cover. The `VerificationRequestId` for these records is `749989`.
- The duplicate records showed all fields with same values except for `checkin_cover_limit_amount_local_curr` and `checkin_cover_limit_amount_in_gbp`.
## Root cause research
- Pablos initial suspicion was a duplicate record in the `Payments` table causing the issue, but this was not the case.
- Pulling the thread, Pablo found out that the DWH table `int_core__check_in_cover_prices` had two records for `EUR` and `CAD` .
- This issue was causing the original problem of duplicate records in `reporting.core__vr_check_in_cover` .
- This is because `int_core__check_in_cover_prices` is expected to have only one record per currency.
- This is so because `int_core__check_in_cover_prices` builds a record per price by grouping the `PaymentValidationSetToCurrency` by `CurrencyIso`, `CheckInCoverCost` and `CheckInCoverLimit` .
- So, the next step was to find out why `int_core__check_in_cover_prices` was showing the duplicate records for `EUR` and `CAD`.
- The source table for this data in the DWH is `sync_core.PaymentValidationSetToCurrency`.
- Pablo ran the following query:
```sql
SELECT *
FROM sync_core."PaymentValidationSetToCurrency" pvstc
WHERE ("CheckInCoverCost" != 11
AND "CurrencyIso" = 'EUR')
OR
("CheckInCoverCost" != 13
AND "CurrencyIso" = 'CAD')
```
Which yielded the following output:
```
"Id","Fee","Amount","Waiver","Protection","Reschedule","CreatedDate","CurrencyIso","UpdatedDate","IsFeeRefundable","CheckInCoverCost","CheckInCoverLimit","PaymentValidationSetId","DisabledValidationOptions","_airbyte_raw_id","_airbyte_extracted_at","_airbyte_meta"
29583,0.000000000,690.000000000,21.000000000,48.000000000,,2024-05-16 15:04:23.080,CAD,2024-05-16 15:04:23.080,false,690.000000000,130.000000000,3710,7,f13f779f-e054-465e-b301-aa38e88808e0,2024-05-16 18:00:12.917 +0200,"{""errors"": []}"
31053,46.000000000,930.000000000,110.000000000,760.000000000,,2024-06-13 17:47:04.143,EUR,2024-06-13 17:47:04.143,false,14.000000000,130.000000000,3894,18,"43112786-986a-4cfa-aef6-135c1a1b5067",2024-06-13 21:00:13.103 +0200,"{""errors"": []}"
31085,19.000000000,940.000000000,66.000000000,940.000000000,,2024-06-13 18:52:47.003,EUR,2024-06-13 18:52:47.003,false,14.000000000,130.000000000,3898,19,fbd15fa4-8691-41ea-a8d2-1edb82e4355f,2024-06-13 22:00:12.451 +0200,"{""errors"": []}"
```
- The results show that there are three records of `PaymentValidationSetToCurrency` that dont have the *regular* values for `EUR` and `CAD`.
- This is a major issue, because there was a established contract that, even though CheckIn Cover cost and limit figures appear in different records per `PaymentValidationSet` , the price is supposed to be a global one for all of Superhog. This data breaks the contract.
- The next question that was posed was: is this data looking the same in Superhogs backend?
- I ran the following query in the Core database, `Live`
```sql
SELECT Id, CurrencyIso, Amount, Fee, PaymentValidationSetId, CreatedDate, UpdatedDate, IsFeeRefundable, DisabledValidationOptions, Waiver, Protection, Reschedule, CheckInCoverCost, CheckInCoverLimit
FROM live.dbo.PaymentValidationSetToCurrency
WHERE Id = 31053 OR Id = 31085 OR Id = 29583
```
Which yielded the following output:
```
"Id","CurrencyIso","Amount","Fee","PaymentValidationSetId","CreatedDate","UpdatedDate","IsFeeRefundable","DisabledValidationOptions","Waiver","Protection","Reschedule","CheckInCoverCost","CheckInCoverLimit"
29583,CAD,690.00000,0.00000,3710,2024-05-16 15:04:23.080,2024-05-16 15:04:23.080,0,7,21.00000,48.00000,,13.00000,130.00000
31053,EUR,930.00000,46.00000,3894,2024-06-13 17:47:04.143,2024-06-13 17:47:04.143,0,18,110.00000,760.00000,,11.00000,85.00000
31085,EUR,940.00000,19.00000,3898,2024-06-13 18:52:47.003,2024-06-13 18:52:47.003,0,22,66.00000,940.00000,,11.00000,85.00000
```
- Major problem. Data is not looking the same.
- The `CheckInCoverCost` in `dwh.sync_core.PaymentValidationSetToCurrency` are `690, 14, 14`.
- The `CheckInCoverCost` in `live.dbo.PaymentValidationSetToCurrency` are `13, 11, 11`.
- This is pointing to an issue in the Core <> DWH integration that happens through Airbyte.
Summarizing the issues, from root to effects:
- Some faulty `live.dbo.PaymentValidationSetToCurrency` values somehow came from Core to DWH, and were afterward changed in Core. This must have been done without respecting the `UpdatedDate` field of the table.
- The faulty values broke the intended granularity of `dwh.intermediate.int_core__check_in_cover_prices`, which propagated into `dwh.reporting_.ore__vr_check_in_cover`
- The issue in `dwh.reporting_.ore__vr_check_in_cover` caused (and its still causing) revenue and funnel numbers to show wrong stats. Basically inflating them artificially.
## Remediation
- Short-term: I will have to run a backfill between `live.dbo.PaymentValidationSetToCurrency` and `dwh.sync_core.PaymentValidationSetToCurrency` to ensure that the data across both is the same again.
- Beyond that: we need to understand how this situation came to life and ensure it is not repeated. My (Pablo on the keyboard) **hypothesis** on what happened is the following:
- Someone modified the CheckIn Cover prices in Wilbur for some accounts, in the fields that should NOT be editable yet are (Joan and Lawrence can provide more details on this issue). Could have been an AM experimenting or trying to catter to some host needs perhaps?
- Someone realized this happened and somehow put the necessary dev resources to fix it in the database straight. I mean to say, they literally just brought back the database field values to what they should have been. This is in contrast with simply changing the setting in Wilbur again, which wouldnt have realize solved the problem, for every time this changes are made in Wilbur, a new record gets created in `live.dbo.PaymentValidationSetToCurrency`, meaning the faulty values would still remain there. Given this behaviour, Im pretty confident whoever worked on this understands the bad implications of having multiple prices per currency in that table, and decided to do this database changed consciously to avoid it.
- This was done without updating the `UpdatedDate` fields as the SQL `UPDATE` statement happened.
- Because of this, Airbyte didnt pick up the changes and never brought the new data for those records into the DWH. This is because Airbyte syncs the data of table `live.dbo.PaymentValidationSetToCurrency` incrementally, by only brining over data that was modified since the last Airbyte run. Airbyte infers whether data was modified or not by looking at the `UpdatedDate` field. If the field is not respected when doing updates, Core and DWH end up out of sync.
- I would like to emphasize the importance of preventing this type of issue. The errors caused by this instance were small, but this could turn into massive reporting mistakes. Furthermore, these are by nature very difficult to spot and troubleshoot, meaning that they could live on a long time, leading to TMT and other Managers relying on wrong reporting for their business decision making, investor reporting, etc.
## Final reflection on the mistakes that got us here
- First, we recycled the Cancellation Cover data model for the CheckIn Cover in a rushed way, resulting in Cores data model being completely out of sync with the reality of the service (the data model allows different hosts and currencies to have different CheckIn Cover prices, when the business logic around this service is that theres a single, Superhog-wide price for each currency).
- Second, we allowed the UI of Wilbur to have fields that let users modify these values on a host level, which is again completely out of sync with our business logic because different hosts shouldnt have different prices for this service, and no user should ever change that value.
- Third, some user managed to use the UI-feature-that-shouldnt-exist the wrong way to change the values, even though this should really not be done.
- Fourth, someone modified the database to fix the third mistake, but introduced *another* mistake by failing to respect the `UpdatedDate` field in the process.
This is a long story of tech debt and bad choices bringing us to a costly mistake. We were lucky it didnt cause a big problem, but it could have. I hope we can all learn from this to avoid these issues.
**2024-06-21 update**: after a discussion together with Lawrence, we found out what we think is the cause of the values being “corrected” in the Core database without respecting the `UpdatedDate` field.
This data is being overwritten on every migration as part of the seeding process that the ream runs on deployments, replacing any existing values around the CheckIn Cover with the ones provided by the seed hardcoded values. The faulty values introduced by an user were most probably overwritten again once the team applied a new migration in the database.
Besides that, it was an important finding since we also realized that this seeding process does not update the `UpdatedAt` fields.

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,98 @@
# 20240621-01 - Failure of Core full-refresh Airbyte jobs
## Failure of Core full-refresh Airbyte jobs
Managed by: Pablo
## Summary
- Components involved: dbt, Airbyte, dwh-prd
- Started at: *2024-06-21 3:09AM CEST*
- Detected at: *2024-06-21 3:09AM CEST*
- Mitigated at: *2024-06-21 11:35AM CEST*
Some tests around dbt materialization performed in production by Pablo on 2024-06-20, plus a lack of proper clean-up after them, derived in Airbyte failing to run full-refresh loads from Core due to not being able to delete the tables properly. This left the affected tables and their dependants in DWH for around 9 natural hours/4 business hours.
## Impact
The following tables were not refreshed on the nightly run of 2024-06-21, making data remain stale and outdate to the 2024-06-20 state:
| Source | Schema | Table |
| --- | --- | --- |
| Core (SQL Server - Live) | `Integration` | `Integration` |
| Core (SQL Server - Live) | `Integration` | `IntegrationType` |
| Core (SQL Server - Live) | `dbo` | `Country` |
| Core (SQL Server - Live) | `dbo` | `Currency` |
| Core (SQL Server - Live) | `dbo` | `PaymentStatus` |
| Core (SQL Server - Live) | `dbo` | `PricePlanChargedByType` |
| Core (SQL Server - Live) | `dbo` | `User` |
| Core (SQL Server - Live) | `dbo` | `UserVerificationStatus` |
| Core (SQL Server - Live) | `dbo` | `VerificationPaymentType` |
| Core (SQL Server - Live) | `dbo` | `VerificationStatus` |
## Timeline
Timezone: CEST
| Time | Event |
| --- | --- |
| 2024-06-21 03:00 | A scheduled job (ID: 4544) of the Airbyte sync `Superhog - Live - integration → dwh-prd (Full Refresh)` begins. |
| 2024-06-21 03:09 | After 5 failed attempts, job 4544 is marked as failed and a warning is sent to the Slack channel `#data-alerts` |
| 2024-06-21 06:00 | A scheduled job (ID: 4552) of the Airbyte sync `Superhog - Live -dbo → dwh-prd (Full-refresh models)` begins. |
| 2024-06-21 06:31 | After 5 failed attempts, job 4552 is marked as failed and a warning is sent to the Slack channel `#data-alerts` |
| 2024-06-21 08:00 | The regular, scheduled `dbt run` happens normally. Since its running with the usual setting of materializing `staging` models as `table`, it destroy the dirty views that were left from the previous day. |
| 2024-06-21 09:50 | Pablo picks up the alerts and research begins. |
| 2024-06-21 11:21 | Pablo triggers the failed syncs manually. |
| 2024-06-21 11:26 | The syncs have executed successfully. |
| 2024-06-21 11:31 | Pablo triggers a `dbt run` manually. |
| 2024-06-21 11:35 | The `dbt run` finishes successfully. |
| | End of the incident. |
## Root Cause(s)
- Pablo ran some `dbt`, `staging` layer models in the DWH as views instead of as tables on 2024-06-20.
- The views were left in the DWH.
- The following full-refresh Airbyte jobs on the `sync_core` schema failed upon trying to run `DROP` on the `sync_core` tables that then had dependant views, for Airbyte is running `DROP`, not `DROP CASCADE`. This is not a problem usually since we materialize `staging` as `table`, because that creates no dependency relationship between the `sync_core` tables and their `staging` table counterparties.
## Resolution and recovery
- The scheduled `dbt run` job that ran on 8:00 CEST deleted the dangling views from the previous day and brought the DWH back to using tables in the `staging` layer.
- Without the views in place, it was only necessary to manually trigger the failed Airbyte syncs again to bring the `sync_core` tables that had been outdated back to being up to date.
- After that, the `dbt run` was manually triggered as well to bring all the dependant models back to being up to date.
- All jobs ran successfully and the DWH was brought back to perfect state without issues.
## **Lessons Learned**
- What went well
- Alerts made sure we picked up the problem fast
- What went badly
- The manual testing left dirt in the DWH
- Where did we get lucky
- The context of the `dbt` project since Pablo did the testing until the next morning allowed for the upcoming scheduled `dbt run` to automatically remove all the undesired views from the DWH, making the job of resolving the incident as simple as re-running everything. But things could have been different and some views could have been left in the DWH, which would have made recovery more complex and error-prone.
General lesson: dont test in production.
The tests Pablo was running on 2024-06-20 were related to how to turn our `staging` layer in DWH from being materialized as `table` currently to `view` instead. This incident was a small sample of how this wouldnt work as simply as initially expected given the nature of Airbyte full-refresh behaviour, in combination with Postgres `DROP` and `DROP CASCADE` commands.
The following Github issue shows other people having the same discussion: https://github.com/airbytehq/airbyte/issues/35386
That discussion lead to new developments in Postgres which enabled the features that would be necessary to achieve our goal. This doc page explains the features: https://docs.airbyte.com/integrations/destinations/postgres?_gl=1*vst8mh*_gcl_au*MzcyNzc0OTUzLjE3MTc1MDM2MDI.#creating-dependent-objects
It will be necessary to run version updates on Airbyte to achieve this.
## Action Items
- [ ] Design a way to easily replicate the production DWH in order to minimize the need to run tests there.
- [ ] Update Airbyte to pick up the new versions of the Postgres connector and plan, test and implement the change of the `staging` layer materialization strategy from `table` to `view` properly.
## Appendix
Logs of the failed Airbyte sync jobs.
[default_workspace_job_4552_attempt_5_txt](default_workspace_job_4552_attempt_5_txt.txt)
[default_workspace_job_4544_attempt_5_txt](default_workspace_job_4544_attempt_5_txt.txt)
Slack alerts by Airbyte:
![Untitled](Untitled%2045.png)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,46 @@
# 20240709 Retro
## 🙌 What went well
- **Delivery**
- Huge advancements on reporting capabilities (KPIs, Xero, Check in Hero, currency conversion)
- Important/critical data captain subjects moving forward
- **Stakeholders**
- Good organization and advancing with tasks, good feedback from outside
- Priorities setting and alignment with stakeholders is now much leaner and efficient
- Awareness campaign with engineers around breaking stuff in Core
- **Internal**
- Data Engineering super fast survival training program
- High quality approaches to tough bones: incidents, refactors, etc.
## 🌱 What needs improvement
- **Platform**
- Data drift is happening and we have no scheduled full-refreshes
- Dbt run in production not displaying alerts
- PBI Licenses/Group permissions are a (invisible) ball of hair
- Full-refreshing in local reaches the error “could not resize shared memory segment "/PostgreSQL.4065550950" to 907743232 bytes: No space left on device”
- **Documentation**
- Documentation on business KPIs, both technical (for Data) and broadly for consumers
- Were dropping balls with some conventions (exposures documentation, keeping data catalogue up to date, etc)
- Exploration of tables to check which ones have incomplete, outdated or wrong data
- Not all stakeholders use the data request form still
## 💡 Ideas for what to do differently
- Hands-on knowledge sharing by diversifying working scopes (I feel we have clear ownerships of X products)
- Capacity to focus without interruptions
- Possibility of managing power bi active directories ourselves
## ✔ Action items
- [x] Comilona soon™
- [ ] Fix dbt alerts
- [ ] Agree with Ben R. on a different way to manage permissions
- [ ] Explore local environment postgres improvements
- [ ] Create Ticket to document KPIs dbt area
- [x] Checklists for dbt repo
- [ ] and PBI repo
- [ ] Potentially, also include CI checks in dbt repo
- [ ] Make a cleaning day for Data Catalogue docs
- [ ] 90 minutes retros

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,71 @@
# 20240718-01 - Xe.com data not retrieved
# Xexe did not retrieve the data from xe.com
Managed by: Uri
## Summary
- Components involved: [data-xexe](https://guardhog.visualstudio.com/Data/_git/data-xexe)
- Started at: 2024-07-18 07:00 (local ES time)
- Detected at: 2024-07-18 08:42
- Mitigated at: 2024-07-18 16:50
Xe.com subscription has been suspended because lacking of payment from Superhog side. This made the daily execution fail. Once the payment has been done, and after confirmation from xe.com team, the manual execution of the process worked well.
## Impact
Currency conversion rates on 17th July have not been retrieved. This means that any reporting containing revenue with currency conversion is not displaying highly accurate figures, but rather, is using the conversions from the previous available day (16th July). This only affects for those reports reading DWH that use backend conversion, Xero reporting is not affected. Specifically:
- Currency Exchange report
- Guest Payments report (Business Overview)
- Main Business KPIs (Business Overview) - only Guest Payments related metrics
- Check-in Hero Overview
- Guest Satisfaction (Guest Insights) - not really affected since theres no payment related metric
Impact at the moment is relatively small in the sense that only one day of currency conversion is missing, but failure to fix it soon could increase the impact.
## Timeline
Timezone: CEST
| **Time** | **Event** |
| --- | --- |
| 2024-07-18 07:00:06 | Xexe starts to run on version 0.1.0 |
| 2024-07-18 07:00:09 | Error is raised by processes.py stating that “Didnt find the fields of a good response” while running the healthcheck against xe.com API. |
| 2024-07-18 07:00:13 | Xexe attempts to fetch the rates and fails to do so since the response seems empty, returning a python error on KeyError: from |
| 2024-07-18 07:00:13 | Alert is sent to #data-alerts channel |
| 2024-07-18 08:42 | Alert is spotted by the Data Team |
| 2024-07-18 08:48 | After checking the logs, it does not seem straight-forward at first glance. Its clear that we do not have currency conversion data from yesterday, 17th of July 2024 |
| 2024-07-18 08:54 | A message has been sent to the channel #data to inform that theres an incident ongoing around currency conversion |
| 2024-07-18 09:18 | At this stage seems clear that the healthcheck perform vs. xe.com is the main issue. Maybe the API has been temporarily down, for whatever reason. Im not able to see in xe.com if theres an API availability, so Im not able to make sure this is the reason. At this stage, Ill opt for a single re-run and see what happens. |
| 2024-07-18 09:20 | A re-run is launched, but fails again. The alert is correctly sent to #data-alerts channel. Same error is displayed. |
| 2024-07-18 09:33 | After discussing with Ben R, it seems the problem comes from the billing. A couple of emails have been already shared with Pablo on this subject according to Ben. Ben is going to take a look at it. At this stage, nothing else I (Uri) can do but wait. |
| 2024-07-18 09:56 | Gus forwarded me the email loop from Xe.com, indeed its clearly linked to the billing. |
| 2024-07-18 10:30 | Ben R confirms that the invoice has been settled now. We try a re-run. |
| 2024-07-18 10:35 | Re-run fails with the same error. Maybe the re-activation of our account needs to be done manually from xe.com side |
| 2024-07-18 11:11 | A follow up communication to #data channel has been sent with the details on the root cause and more detailed impacts |
| 2024-07-18 11:13 | A follow up e-mail is sent by Ben R to the original email loop from xe.com, asking for re-activation now that it has been paid |
| 2024-07-18 16:17 | We receive e-mail confirmation from xe.com that the account has been reinstated |
| 2024-07-18 16:43 | A new re-run of xexe process is launched, this time finished successfully |
| 2024-07-18 16:46 | Re-run of DWH to update all tables and reports |
| 2024-07-18 16:50 | A couple of checks are done to ensure data has been updated accordingly. All good, we can consider the incident as mitigated |
| 2024-07-18 16:54 | A final communication to #data channel has been sent communicating the mitigation of the incident |
## Root Cause(s)
There has been a suspension of the service from lack of payment from our side. Email loop shows that there has been communication from Xe.com on this subject on 26th June, a reminder on July 8th and a final communication on 15th July. These emails were sent to [tech@guardhog.com](mailto:tech@guardhog.com) and unnoticed by the Data team - at least Uri/Joaquín, the forward of this e-mail to Pablo was unnoticed since Pablo was on holidays.
## Resolution and recovery
Billing has been settled on the same day as the incident was raised. Once we got confirmation from xe.com that the account has been reinstated, re-running the daily process manually worked perfectly.
## **Lessons Learned**
To be filled later on
## Action Items
To be filled later on
##

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,61 @@
# 20240819 Retro
## 🙌 What went well
- **Holidays reliability**
- Surviving even without the whole team
- Survived holidays without issues
- **Methodology**
- We keep on having a lot of freedom and we are using it nicely
- Quality and methodology stays high
- Capacity to investigate new tools/methodologies (CosmosDB integration, Metric Flow)
- More contact with development team
- Keeping up with our documentations and pressing other teams to do the same
- We are pushing the documentation culture and leading with example
- **Stakeholders**
- Product initiatives should be now estimated and prioritised based on Revenue with the help of the Data team
- Increased access to Business Overview for PMs
- Our customers are very happy with us and our work is appreciated
- Company is starting to appreciate that data is not the owner of invoicing
## 🌱 What needs improvement
- **Dev env, data infra**
- Capacity to run models in local and not running out of memory
- Lack of automation around tests, CI, manual stuff
- Local development keeps on being a bit of a pain in the ass
- Data platform is growing a lot of mushroom components
- **Priorities/Backlog**
- We're going to have a lot of shadow work this quarter with New Dashboard and APIs: we should make it more visible towards TMT
- **People doing crappy stuff**
- Still some shitty initiatives are happening on top of supposedly “well built projects” (Grand Welcome Invoicing, MVP launch with bugs and without documentation, issues with check-in hero, etc.)
- We are pushing the documentation culture and leading with example
- Lack of technical documentation from development team, specially impacting on holidays period
- Product/Engineering has made failures and used bad methodologies in different ways that will cost us expensive
- Incidents go unnoticed generally on backend side
- Reduce bus factor on key projects (for instance, invoicing)
- Lack of synchrony with some initiatives: New Dash MVP misunderstanding on deliverables, Revenue figures mismatch took quite a bit of time to align with Finance
## 💡 Ideas for what to do differently
- ~~Having access to all documentation from development team (confluence)~~
- Treat Backlog/Todos columns in board with a bit more respect (bi-weekly grooming?)
- Ensure that theres minimum description and DOD on tickets
## ✔ Action items
- [x] Run invoicing for september all together holding hands
- [x] Invite sent
- [x] Run MainKPIs training sessions for PMs/Other audiences
- [x] Set bi-weekly grooming session
- [x] Plan Data <> Tech team council of wise men on a quarterly basis
- [x] Document progress towards quarterly goals (emphasis on unplanned work)
- [x] Fix dbt alerts
- [ ] Agree with Ben R. on a different way to manage permissions PBI
- [x] Explore **local** environment postgres improvements
- [x] Create Ticket to document KPIs dbt area
- [x] Checklists for dbt repo
- [x] and pbi repo
- [ ] Potentially, also include CI checks in dbt repo
- [ ] Make a cleaning day for Data Catalogue docs
- [x] 90 minutes retros

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,68 @@
# 20240821-01 - SQL Server connection outage
# SQL Server connection outage
Managed by: Pablo
## Summary
- Components involved: Core, Airbyte
- Started at: 2024-08-21 between 8:05 and 9:00, CEST
- Detected at: 2024-08-21 9:00 CEST
- Mitigated at: 2024-08-21 9:28 CEST
Core database had a database user named `SuperhogProductionRO` that was used by many data services to read from it. Ben R. deleted the user. Afterwards, two incremental EL runs of Airbyte failed due to the Airbyte reader jobs not being able to connect to Core. The situation was fixed by recreating the user and re-running the failed EL jobs.
## Impact
Almost none. DWH and Core drifted for ~90 minutes instead of the usual 60 minutes. Data team was unable to connect to Core for around 30 minutes.
## Timeline
All reported times are in CEST timezone.
| Time | Event |
| --- | --- |
| 2024-08-21, sometime between 08:05 and 09:00 | Ben R. deletes the `SuperhogProductionRO` user from Core. |
| 2024-08-21 09:00 | Scheduled airbyte job with ID 8968, for connection `Superhog - Live - dbo → dwh-prd (Incremental models)` fails with error `Failure reason: State code: S0001; Error code: 18456; Message: Login failed for user 'SuperhogProductionRO'. ClientConnectionId:142369f2-c0b1-47e0-a97b-ead406196f5f`.
Logs for this job are attached below. |
| 2024-08-21 09:05 | Scheduled airbyte job with ID 8969, for connection `Superhog - Live - survey → dwh-prd (Incremental)` fails with error `Failure reason: State code: S0001; Error code: 18456; Message: Login failed for user 'SuperhogProductionRO'. ClientConnectionId:d353592c-421a-4bf6-bd8d-087a599d0f61`.
Logs for this job are attached below. |
| 2024-06-21 09:20 | Pablo and Ben R. jump on a call to troubleshoot, Ben communicates that the user was deleted. Both agree to recreate the user with the same credentials to recover services and Ben does that on the spot. |
| 2024-06-21 09:28 | Pablo manually triggers syncs for both Airbyte connections that fail, and both run successfully. |
| | End of the incident. |
## Root Cause(s)
Ben R. deleted the `SuperhogProductionRO` user from the Core database without any notification to the Data team.
## Resolution and recovery
`SuperhogProductionRO` was recreated as it was before deletion.
Failed EL jobs that failed during the outage were re-executed.
## **Lessons Learned**
- What went well
- Alerts were immediate and pointed to the problem clearly.
- We responded in record time.
- What went badly
- We had a clear organizational misalignment regarding how that DB user was being used.
- Where we got lucky
- On Ben R. being available a few minutes after the issue started. Data team could not have recovered from the incident without his assistance.
## Action Items
- [ ] Develop some documentation/alignment with tech team to document the user dependencies towards the Data team to avoid unplanned changes like this to cause issues.
- [ ] Internally, in the Data team, keep some documentation on what are relevant credentials and where are they used to ease the change of them. If we had had to change the credentials everywhere where `SuperhogProductionRO` was being used, we would have hard to basically recall from memory what are all those places. Having a documented list of where the user gets used would have eased the job of changing the credentials, increasing speed and confidence in the recovery.
## Appendix
Failed jobs log files:
[default_workspace_job_8969_attempt_1_txt](default_workspace_job_8969_attempt_1_txt.txt)
[default_workspace_job_8968_attempt_1_txt](default_workspace_job_8968_attempt_1_txt.txt)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,68 @@
# 20240902-01 - Missing payment details in intermediate
# Missing payment details in intermediate
Managed by: Pablo
## Summary
- Components involved: dbt run, dbt test and airbyte
- Started at: Unknown, probably months ago
- Detected at: 2024-08-31, 08:16AM CEST
- Mitigated at: 2024-09-02, 11:57AM CEST
The simultaneous trigger of `dbt run` and Airbytes incremental jobs at the same time every morning had been leading to data integrity issues in the DWH for a long time. Our current release of scheduled `dbt test` just a few minutes after our scheduled `dbt run` made the problem obvious and we finally got to fix it.
## Impact
For months, a handful of verification payments were missing money amounts, both in local currency and GBP. This might have made total revenue figures in some reports insignificantly wrong (deviations being smaller than 0.1%).
## Timeline
All reported times are in CEST timezone.
| Time | Event |
| --- | --- |
| Some time around March 2024 | Pablo implements the `int_core__verification_payments` model in the `dbt` project. |
| Some time around August 2024 | As we work on implementing scheduled `dbt` tests, we recurringly observe issues with some `not null` tests on model `int_core__verification_payments` . The behaviour is flaky, so no special attention is paid at first. |
| 2024-08-31 08:16AM | A `dbt test` fails, showing some null value issues in money related columns in the model `int_core__verification_payments` |
| 2024-09-01 08:16AM | A `dbt test` fails, showing some null value issues in money related columns in the model `int_core__verification_payments` |
| 2024-09-02 08:16AM | A `dbt test` fails, showing some null value issues in money related columns in the model `int_core__verification_payments` |
| 2024-09-02 09:00AM | Data team notices the issue (previous alerts happened on the weekend) and starts investigating. |
| 2024-09-02 11:30AM | Pablo spots the possible issue and confirms by triggering another `dbt run` NOT at a oclock and re-running `dbt test`. |
| 2024-09-02 11:35AM | Pablo changes the schedule of `dbt run` and `dbt test` to ensure that the `dbt run` doesnt clash with Airbyte jobs and that the `dbt test` runs clearly after `dbt run`. |
| | End of the incident. |
## Root Cause(s)
Some Airbyte jobs were running simultaneously with the scheduled runs of our dbt project. This meant the `dbt run` executions were happening as the `sync` layer of some sources were being populated. Because of this, the `dbt run` wasnt running on a consistent snapshot of the `sync` layer. This caused referential integrity issues downstream.
## Resolution and recovery
Every day the problem fixed itself for the previous day issues. So, each day we only suffered some row issues from the current day.
The same-day recovery was achieved by simply running `dbt run` when Airbyte jobs were NOT running.
The proper resolution was achieved by re-arranging the schedule of jobs so that `dbt run` does not happen at the same time as Airbyte.
## **Lessons Learned**
- What went well
- Our `dbt test` were super useful to spot this happening every day.
- What went badly
- This issue seems to have existed for 6 months. The business impact was tiny, but how long this was alive is highly concerning.
- Where did we get lucky
- We got lucky in the impact being tiny.
More generally, this is a first symptom of our home-made-bash-orchestration starting to show some wrinkles. A more sophisticated orchestration engine should enable us to link together airbyte and dbt executions, which would allow to prevent this kind of issues and also be more smart in our ELT (lower latency, less redundant jobs, reasonable management downstream when something upstream fails, etc).
No need to rush into it, but should be taken into account.
## Action Items
- [ ] Educate the team on scheduling patterns.
- [ ] Ensure the new orchestration engine deployment gets the right priority.
## Appendix
-

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,57 @@
# 20240913 Retro
## 🙌 What went well
- **dbt**
- Improvement of performance on dbt project, had less incidents when running models.
- dbt tests are working great
- DBT automatic testing to detect shitty stuff going on
- **Collaboration with teams**
- Quarterly session with Tech team was a great thing to do
- Nice discussions / alignments with TMT/Tech leads
- Very nice internal Data Team collaboration!!!!
- Nice advancements on aligning with Guest Squad (tracking, A/B testing, etc)
- Data Comilonas <3
- **Delivery**
- Were doing great in making the most out of our infra
- Very glad to have Cosmos DB integrated into DWH and the following refactors to centralise efforts
- Unleashing analytical capabilities with KPIs by Categories
- Documentation keeps on being great, were being recognized as exemplary on it
- Probation periods successfully passed!
- Truvi logo is now readable (and cool)
## 🌱 What needs improvement
- **PBI Awareness/know how**
- Lack of knowledge of “this data is available in this report”
- Spread more knowledge on (1) what PBIs we have + (2) best ways to use them
- I suspect a lot of reports are being super-underused, but we cant monitor that easily
- PBI users need to have more knowledge as to where and how they can get the data they need
- **Data Contracts/Issues with Tech**
- Docs and comms with Tech team are not on their best spot + we might need to raise this more loudly and frequently + no clear visibility on every time tech is impacting data
- Shitty stuff happens on Tech deployments:
- 27th August Check-out Bookings and Cancelled Bookings fake increase because of PMS issues. The later, the issue still persists
- New Dash MVP migration on 10th of September broke the report despite anticipating the changes needed from dev team
- Remind development to not change tables or adding test data without our knowledge
- A bit stressed of not knowing when/if well have a Data Engineer position opening soon
## 💡 Ideas for what to do differently
- Incentivise people to use the Data Request for ad-hoc requests and asking for permissions
- Explore the possibility to have a “report” to check PBI report usage
- Have a PBI 101 class for users
- Since we rarely have incidents, we might need to cause them to grease the groove
- Open Data Comilonas with different stakeholders and colleagues from time to time (it was nice with Joan!)
## ✔ Action items
- [x] Programar Data Comilona
- [x] Pablo schedules his Chaos Monkey role
- [x] Research if there is any better way to monitor PBI report usage
- [ ] Schedule harsh therapy session with Lou
- [ ] If Data Engineer vacancy doesnt progress by end of september, pursue sign-off on consequences
- [ ] Think about how to make some kind of “PBI Homepage” where Superhog personnel can find all the PBIs that are available easily
- [ ] Document all the config references (URLs, DB connection strings, credentials, etc)
- [ ] Agree with Ben R. on a different way to manage permissions PBI
- [ ] Potentially, also include CI checks in dbt repo
- [ ] Make a cleaning day for Data Catalogue docs

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,67 @@
# 20240913-01 - dbt run blocked by “not in the graph” error
# `dbt run` blocked by “not in the graph” error
Managed by: Pablo
## Summary
- Components involved: DWH, dbt
- Started at: *When did the issue actually start*
- Detected at: *When did we notice that the incident existed*
- Mitigated at: *When did we bring things to a stable state without further impact*
We deployed for the first time a version of our `dbt` project that used the versioning features of `dbt`. An active bug on `dbt core` prevented any `dbt` commands like `dbt run` or `dbt test` to work because the compilation of the project would fail. The issue was resolved by applying a somewhat patchy workaround that enables `dbt` to work again properly.
## Impact
None beyond some noise in the alerts channel and making Pablos Friday afternoon hectic.
## Timeline
*Keeping it simple on this one since there isnt much value in tracking stuff in hyper detail.*
All reported times are in CEST Timezone.
| Time | Event |
| --- | --- |
| 2024-09-13 15:24 | Pablo merges [PR #2771](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2771) in the dbt project |
| 2024-09-13 15:25 | Pablo manually triggers the `run_dbt.sh` script in production and the execution fails |
| 2024-09-13 15:25-15:52 | Pablo scrambles around trying to understand what the heck is happening. |
| 2024-09-13 15:52 | Pablo manages to get a first successful `dbt run` after applying one of the workarounds suggested by dbt labs |
| | End of the incident. |
## Root Cause(s)
The root cause is a bug in `dbt` when it attempts to parse and compile the project. The bug is triggered by adding a new version to an existing model that didnt have versions before. It is unknown at this point if this bug is also triggered when adding an additional version to a model that is already being versioned. The bug is identified by dbt labs and is sitting in this issue in the `dbt core` repository: https://github.com/dbt-labs/dbt-core/issues/8872
Within our platform, the issue was introduced by the following PR in our `dbt` project repository: https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2771. The PR introduced a new version for model `int_core__verification_payments`, which triggered the `dbt core` bug when we started to run `dbt` commands in production.
## Resolution and recovery
I manually modified our deployed `dbt_run.sh` script in production to include a `dbt clean` and `dbt deps` commands before we executed the `run` ones. This deleted the `target` folder and fixed the issue, since this is one of the suggested workarounds. Subsequent executions of the script after this fix ran perfectly fine.
## **Lessons Learned**
What went well:
- NA
What went badly
- This issue already happened in my local (Pablo) some days before, but I dismissed it as some silly flaky behaviour. I guess I probably randomly executed one of the workarounds (running a `dbt clean`) and in the process, accidentally fixed the issue without really understanding what had happened. The lesson here is to not dismiss quirky behaviours in `dbt` and to try to understand them fully (even reproduce them if necessary) so that we can be confident and in control at all times.
Where did we get lucky:
- The fact that the issue was already spotted and documented in the official `dbt core` repository made handling the situation much simpler. Had there been no public showcase of the bug source and workarounds, we would have had a bad time fixing and understanding stuff since it would have required to dive into the internals of `dbt`.
Besides these lessons, I would also suggest this was a great reminder of the fact that the open source tools we rely on are by no means perfect, and that we must be alert when stuff goes south and always consider the option that they have bugs.
## Action Items
- [ ] Judge our options around Blue/Green deployments, which would enable issues like this to happen without making a single scratch in the DWH where consumers are reading from (besides an inevitable delay in the refreshing of data).
- [x] Track the `dbt` bug (https://github.com/dbt-labs/dbt-core/issues/8872) so that we can adjust our code once its fixed
## Appendix
-

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,66 @@
# 20240919-01 - dbt test failure because wrong configuration in schema file
# dbt test failure because wrong configuration in schema file
Managed by: Uri
## Summary
- Components involved: data-dwh-dbt-project
- Started at: 2024-09-18 12:41 CEST
- Detected at: 2024-09-19 08:43 CEST
- Mitigated at: 2024-09-19 09:01 CEST
## Summary
A buggy code was commited and merged into master on 18th of September that was unnoticed. In the scheduled production run in the morning of the 19th, dbt test failed because couldnt compile the test. The fix has been to remove the buggy configuration in the schema entry of core__bookings, merge, re-run dbt test in prod.
## Impact
Not a massive impact because it was a test failing in reporting in `core__bookings` model
## Timeline
- 2024-09-18 12:41 CEST - Faulty commit [923bfa70](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/923bfa70919bf552304150e8cc3ec9af7cdbe708?refName=refs%2Fheads%2Fmaster&path=%2Fmodels%2Freporting%2Fcore%2Fschema.yml&_a=contents) is created
- 2024-09-18 16:30 CEST - Branch containing the faulty commit in pull request [!2877](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2877?_a=files&path=/models/reporting/core/schema.yml) is merged into production
- 2024-09-19 08:43 CEST - Data team sees the alert in `#data-alerts` slack channel
- 2024-09-19 08:48 CEST - Data team accesses the production logs of dbt tests to notice the failure, specifically:
> Compilation Error in test not_nullgit_core__bookings_id_booking (models/reporting/core/schema.yml)
>
- 2024-09-19 08:51 CEST - The faulty commit [923bfa70](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/923bfa70919bf552304150e8cc3ec9af7cdbe708?refName=refs%2Fheads%2Fmaster&path=%2Fmodels%2Freporting%2Fcore%2Fschema.yml&_a=contents) is spotted. Uri proceeds to create a PR to remove the issue.
- 2024-09-19 09:00 CEST - The fix is merged in production in commit [feaedb2a](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/feaedb2a06bd37555217ee0fb645c8f5a07b070d?refName=refs%2Fheads%2Fmaster)
- 2024-09-19 09:01 CEST - Succesful launch of a re-run of the dbt tests with the fixes.
## Root Cause(s)
An involuntary human error modified a line of code in the schema entry of `core__bookings` in the test section for `id_booking`, modifying the `not_null` test to `not_nullgit p`. This change, in the commit [923bfa70](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/commit/923bfa70919bf552304150e8cc3ec9af7cdbe708?refName=refs%2Fheads%2Fmaster&path=%2Fmodels%2Freporting%2Fcore%2Fschema.yml&_a=contents), happened after the review of the data team members on the PR [!2877](https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project/pullrequest/2877?_a=files&path=/models/reporting/core/schema.yml) once it was approved, thus it went unnoticed.
![image.png](image%2061.png)
## Resolution and recovery
Fix has been straight forward: just change back the `not_nullgit p` to `not_null`. Afterwards, merge into prod and re run the dbt tests successfully.
## **Lessons Learned**
What went well:
- dbt test alerts work well and the team effectively checks the channel once an alert is raised.
What went badly
- Unproper self-review and cross-review of code before merging. Personally, I didnt check the PR since it was already approved by Pablo. At the same time, this approval came before the faulty commit. We should be all more careful/sceptical when merging into production, specially if we leave an approval in the PR.
Where did we get lucky:
- Minimal impact, it was just a single failing test in reporting schema that would have passed anyway. However, this situation could have been worse if this bug had been in place directly in a model code.
## Action Items
- Tend to review and re-review indistinctly of PRs being already approved.
- Check commits made after the approval.
- When merging into prod, run both the normal execution of dbt (`run_dbt.sh`) and the tests (`run_tests.sh`). This would have make this issue appear early
- Automate CI checks on the dbt project (try to compile the project and perhaps also run tests on every PR, block merging if it doesnt work)
##

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,46 @@
# 20241008 Retro
## 🙌 What went well
- **Team and Jarana**
- Data Comilona + Meetup = Nice combo! +1
- We survived the period without Ben
- **Data Platform**
- No more dumps <3
- First usages of dbt versioning have worked nicely
- dbt testing is working nicely so far
- Performance and DevEx improvements in both production and dev environments have made things much better
- **Delivery**
- Hubspot integration in progress
- Data requests feedback was very positive
- KPIs sessions with PMs were quite insightful
- Q3 was closed with success and everyone is happy, plans for Q4 are clear and aligned with everyone
- Business Overview PBI app looks very good after latest changes from Joaquin
- **Stakeholders**
- New Dash/New Pricing much better communication with Product/Tech teams
- Very happy to see that data quality improvements are being prioritised in Guest Squad (GJ completion date)
## 🌱 What needs improvement
- **DE Vacancy**
- Data engineer vacancy
- Timing for Data Engineer is looking ugly
- **New dash**
- Still its quite confusing and time consuming to work on New Dash/New Pricing
- New Dash/Tech/Product drama
- **Other**
- Edeposit invoicing misunderstanding
## 💡 Ideas for what to do differently
-
## ✔ Action items
- [ ] If Data Engineer vacancy doesnt progress by end of september, pursue sign-off on consequences
- [ ] Think about how to make some kind of “PBI Homepage” where Superhog personnel can find all the PBIs that are available easily
- [ ] Document all the config references (URLs, DB connection strings, credentials, etc)
- [ ] Agree with Ben R. on a different way to manage permissions PBI
- [ ] Potentially, also include CI checks in dbt repo
- [ ] Make a cleaning day for Data Catalogue docs
- [ ] Document existing invoicing processes, not just new ones

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,241 @@
# 20241104-01 - Booking invoicing incident due to bulk UpdatedDate change
# Booking invoicing incident due to bulk UpdatedDate change
Managed by: Pablo
## Summary
- Components involved: Superhog Backend SQL Server, DWH, PBI Reports, `sh-invoicing-exporter` tool
- Started at: 2024-10-30 12:55:07.653 UTC
- Detected at: first symptoms noticed around 2024-10-31 08:00:00 UTC, but severity was truly understood on 2024-11-04 10:57:00 UTC
- Mitigated at: 2024-11-05 10:57:00 UTC
A bulk backfill executed on the application SQL Server database to fix some not-relevant-to-this-incident column resulted in tens of thousands of `VerificationRequest` records having their `UpdatedDate` modified to when the backfill was executed.
A poor assumption in the old dash invoicing logic was severely impacted and caused (1) the billable bookings metrics and reports to be showing utterly wrong data for 6 days and (2) a delay of ~27 hours in the delivery of the old dash invoicing exports to the finance team for the October 24 period.
A backup of our SQL Server was restored and the incident-triggering changes were reverted in an emergency to unblock the generation of the invoicing exports. The root issue still exists and needs to be addressed.
## Impact
- On the invoicing process:
- The start of the old dash invoicing process, which is the generation of exports run by Pablo, should have started on 2024-11-04 08:00:00 UTC. Instead, it started on 2024-11-05 10:57:00 UTC, adding a delay of ~27 hours to all depending Finance team processes.
- On reporting:
- Since 2024-10-30 12:55:07.653 UTC and until 2024-11-05 10:20:00 UTC.
- The DWH table `int_core__booking_charge_events` was displaying tens of thousands of billable bookings on 2024-10-30 that were wrong. This propagated through other DWH tables and finally to reports. contained a grossly wrong figure for October 24.
- The Business Overview > Main KPIs report showed a grossly wrong count of invoiceable bookings for October (somewhere x2-3 orders of magnitude what the number should have been).
- The Business Overview > Host Fees report show inflated numbers for the Superhog-inferred billable bookings count and booking fees revenue for October 24 (somewhere x2-3 orders of magnitude what the number should have been).
- On the Guest Squad efforts,
- Our mitigation solution of reverting the changes made on the `UpdatedDate` might have cause more troubles: the SQL Server is now *lying*, since the `PaymentValidationSetId` values of many records *were* updated on 2024-10-30 12:55:07 UTC, but the `UpdatedDate` values of those records are now effectively lying. Im not aware of how this may cause further problems, but it could.
## Timeline
All times are UTC.
| Time | Event |
| --- | --- |
| 2024-10-30 12:55:07 | The bulk update script for the `PaymentValidationSetId` column on the table `VerificationRequest` gets executed, changing the `UpdatedDate` value of tens of thousands of records. |
| 2024-10-31 06:28:00 | An automated outliers data alert gets triggered due to the wild variance in the Estimated Billable Bookings KPI. |
| 2024-10-31 07:49:00 | Uri notices the issue (leaving a note in the data alerts chat) and correctly spots the fact that there is a spike of booking charge events on the date 2024-10-30. |
| 2024-10-31 08:30:00 | The alert gets discussed during the Data Teams daily call. Pablo wrongly judges that the `UpdatedDate` data shouldnt cause an issue in invoicing and its just a minor KPI blip that can be fixed in the future, and the team decides that the alert is not urgent. |
| 2024-11-04 08:30:00 | The Data team discusses again this topic in the daily call. The fact that its an invoicing exports day increases attention, and upon looking into some details, the team switches his mind and realises there might be serious implications for invoicing. |
| 2024-11-04 09:30:00 | Ben C. inquires the Data team about the report in Business Overview > Host Fees > Booking Fees - Superhog is showing some wildly high numbers for October. |
| 2024-11-04 10:51:00 | After some detailed research, Pablo realises that the invoicing imports are broken and starts the #invoicing-firefightning slack channel to gather stakeholders. |
| 2024-11-04 11:09:00 | Pablo and Ben R. discuss about the issue and assess the option of switching the invoicing code to rely on `LinkUsed` instead of `UpdatedDate`. They agree on Pablo examining if that would do the trick. |
| 2024-11-04 12:17:00 | Pablo concludes that the naive `LinkUsed` option wont do the trick due to how data looks in `VerificationRequest`, and comes back to Ben R. to discuss how to proceed. They agree to, instead, restore the original values of the `UpdatedDate` columns in the records that were updated on 2024-10-30 12:55:07. |
| 2024-11-04 12:23:00 | Ben R. starts restoring a database backup to restore the records. |
| 2024-11-04 15:58:00 | Since the restore is taking longer than expected, Ben R. proposes running a simpler update by leveraging some fields in `VerificationRequest`, but Pablo points out that a partial solution wont help the Finance team since running the exports multiple times would mean Finances manual work is only useful after the final export. |
| 2024-11-05 08:00:00 | After a first failed restores, the second backup restore works on SQL Server. |
| 2024-11-05 9:18:00 | Ben uses the restored data to revert the `UpdatedDate` changes in the records that were modified on 2024-10-30. |
| 2024-11-05 10:20:00 | Pablo starts a backfill in Airbyte and a dbt run right after to propagate the new updated records throughout the DWH. |
| 2024-11-05 10:54:00 | Pablo confirms that the large cluster of bookings attribute to October is not there anymore, as well as that the downstream reporting shows correct figures again. |
| 2024-11-05 10:57:00 | Pablo triggers the export of the invoicing reports for the October period. |
| 2024-11-05 14:53:00 | The exports finish successfully and Pablo shares them with Jamie D. |
| | Incident mitigated. |
## Root Cause(s)
The root cause is a combination of a:
- A poorly-chosen assumption in the old dash invoicing logic (the usage of `UpdatedDate` field in the `VerificationRequest` table to decide in which month should a booking be charged for its booking fee when it is supposed to be charged on `VerificationStartDate`).
- ([see these lines](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=336&lineEnd=339&lineStartColumn=25&lineEndColumn=26&lineStyle=plain&_a=contents) in the latest release of `sh-invoicing-exporter`, which are the conceptual grand-childen of [these lines](https://guardhog.visualstudio.com/Superhog/_git/superhog-invoicing-console-app?path=/SuperhogInvoicing/SQLQueries.cs&version=GBmaster&line=159&lineEnd=165&lineStartColumn=3&lineEndColumn=6&lineStyle=plain&_a=contents) in the old C# sharp script, to understand the faulty assumption)
- This is a conceptual problem that we still need to address if we want to prevent significant issues in future invoicing cycles. Our initial mitigation was treating symptoms, not the core issue.
- An out of BAU bulk update in the `VerificationRequest` table in the backend SQL Server. I would like to make clear that the intent of this bulk update was perfectly legitimate and its execution was also proper. Even though its side effects have been troublesome, the update itself was not an issue nor a mistake.
So, the true issue is the troublesome reliance of the invoicing code on `UpdatedDate` , which is always a tiny issue, but turns into a massive one anytime any tech squad in Superhog performs an update on the `VerificationRequest` table that goes beyond the usual activity of the application. Given that this out of the usual operations will keep on happening in the future, it is important that we address the true issue to avoid more incidents like this one in the future.
## Resolution and recovery
The problem was mitigated by reverting the changes made in the `UpdatedDate` through the restoring of a backup of the SQL Server database and some adhoc script being run on the production database.
This allowed us to bring reporting back to normal and continue the invoicing exports, at the expense of leaving the SQL Server database in an inconsistent state.
The true solution to the problem is still unaddressed (see the root cause section).
## **Lessons Learned**
*List of knowledge acquired. Typically structured as: What went well, what went badly, where did we get lucky*
- What went well
- Automated KPI outlier tests from the Data team brought the spike of billable bookings into Uris attention.
- The production backups of the SQL Server database allowed us to restore the original `UpdatedDate` values, providing us a fast way to unblock the invoicing process.
- What went badly
- Even though we got an early alert, Pablo wrongly triaged the unusually high number of billable bookings as a minor issue that wouldnt impact the invoicing process.
- The faulty logic/assumption to place the invoicing on bookings in time has been sitting around for years, undocumented. We dont know have any trace of why we built it this way in the first place.
- The faulty logic/assumption to place the invoicing on bookings in time might have been leading to wrongly placement-in-time of bookings fees for a long time, but the high complexity of the logic and the way the history of records is managed in the database make it very hard to understand what is the true extend of the issue.
- Obtaining a restore of the production database can take multiple hours.
- We screwed up with the consistency of data in `VerificationRequest`. Tens of thousands of `UpdatedDate` values in the `VerificationRequest` table are now wrong.
## Action Items
- [ ] Identify all business logic which is now relying on the `UpdatedDate` field of the `VerificationRequest` table in the SQL Server table.
- [ ] Once the above logic is catalogued, apply changes and fixes so that `UpdatedDate` can be modified without causing incidents.
- [ ] Potentially, extend the exercise beyond `VerificationRequest`, since the same problem pattern could apply to all sorts of update-able tables in the SQL Server database.
## Appendix
*Miscellanea corner for anything else you might want to include*
- Link to first notes when we started tackling the issue: [20241101 - Invoicing UpdateDate mess up](https://www.notion.so/20241101-Invoicing-UpdateDate-mess-up-1340446ff9c980b2926fc6284572f740?pvs=21)
- Code for the bulk update script executed on 2024-10-30
```sql
DECLARE @CurrentDate AS DATETIME = GETDATE()
SELECT
[pvs].[VerificationRequestId]
, [vr].[GuestJourneyCompletedDate]
, [vr].[ExpiryDate]
, [vr].[PaymentValidationSetId]
, [pvs].[PaymentValidationSetId]
FROM
(
SELECT [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId]
FROM
(
SELECT
[vr].[Id] AS [VerificationRequestId]
, COALESCE(
[vr].[OverridePaymentValidationSetId],
[a].[PaymentValidationSetId],
[pvs_a].[Id],
[pvs_d].[Id]
) AS [PaymentValidationSetId]
FROM [dbo].[VerificationRequest] [vr]
-- Listing Override
LEFT JOIN [dbo].[Booking] [b] ON [b].[VerificationRequestId] = [vr].[Id]
LEFT JOIN [dbo].[Accommodation] [a] ON [a].[AccommodationId] = [b].[AccommodationId]
-- Account Override
LEFT JOIN [dbo].[PaymentValidationSet] [pvs_a] ON [pvs_a].[SuperhogUserId] = [vr].[CreatedByUserId] AND [pvs_a].[IsCustom] = 0 AND [pvs_a].[IsActive] = 1
-- Default
LEFT JOIN [dbo].[PaymentValidationSet] [pvs_d] ON [pvs_d].[SuperhogUserId] IS NULL AND [pvs_d].[IsCustom] = 0 AND [pvs_d].[IsActive] = 1
) [pvs]
GROUP BY [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId]
) [pvs]
LEFT JOIN [dbo].[VerificationRequest] [vr] ON [vr].[Id] = [pvs].[VerificationRequestId]
LEFT JOIN [dbo].[user] [u] ON [u].[Id] = [vr].[SuperhogUserId]
LEFT JOIN [dbo].[Country] [co] ON [co].[Id] = [u].[BillingCountryId]
LEFT JOIN [dbo].[Currency] [cu] ON [cu].[Id] = [co].[PreferredCurrencyId]
LEFT JOIN [dbo].[PaymentValidationSetToCurrency] [pvstc] ON [pvstc].[PaymentValidationSetId] = [pvs].[PaymentValidationSetId] AND [pvstc].[CurrencyIso] = [cu].[IsoCode]
--WHERE [VerificationRequestId] = 913616
WHERE [vr].[GuestJourneyCompletedDate] IS NULL
and [vr].[PaymentValidationSetId] IS NULL
and [vr].[ExpiryDate] >= GETDATE()
---and [VerificationRequestId] = 913616
BEGIN TRAN
UPDATE [vr]
SET
[PaymentValidationSetId] = [pvs].[PaymentValidationSetId]
, [UpdatedDate] = @CurrentDate
FROM
(
SELECT [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId]
FROM
(
SELECT
[vr].[Id] AS [VerificationRequestId]
, COALESCE(
[vr].[OverridePaymentValidationSetId],
[a].[PaymentValidationSetId],
[pvs_a].[Id],
[pvs_d].[Id]
) AS [PaymentValidationSetId]
FROM [dbo].[VerificationRequest] [vr]
-- Listing Override
LEFT JOIN [dbo].[Booking] [b] ON [b].[VerificationRequestId] = [vr].[Id]
LEFT JOIN [dbo].[Accommodation] [a] ON [a].[AccommodationId] = [b].[AccommodationId]
-- Account Override
LEFT JOIN [dbo].[PaymentValidationSet] [pvs_a] ON [pvs_a].[SuperhogUserId] = [vr].[CreatedByUserId] AND [pvs_a].[IsCustom] = 0 AND [pvs_a].[IsActive] = 1
-- Default
LEFT JOIN [dbo].[PaymentValidationSet] [pvs_d] ON [pvs_d].[SuperhogUserId] IS NULL AND [pvs_d].[IsCustom] = 0 AND [pvs_d].[IsActive] = 1
) [pvs]
GROUP BY [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId]
) [pvs]
LEFT JOIN [dbo].[VerificationRequest] [vr] ON [vr].[Id] = [pvs].[VerificationRequestId]
LEFT JOIN [dbo].[user] [u] ON [u].[Id] = [vr].[SuperhogUserId]
LEFT JOIN [dbo].[Country] [co] ON [co].[Id] = [u].[BillingCountryId]
LEFT JOIN [dbo].[Currency] [cu] ON [cu].[Id] = [co].[PreferredCurrencyId]
LEFT JOIN [dbo].[PaymentValidationSetToCurrency] [pvstc] ON [pvstc].[PaymentValidationSetId] = [pvs].[PaymentValidationSetId] AND [pvstc].[CurrencyIso] = [cu].[IsoCode]
WHERE [vr].[GuestJourneyCompletedDate] IS NULL
and [vr].[PaymentValidationSetId] IS NULL
and [vr].[ExpiryDate] >= GETDATE()
--and [VerificationRequestId] = 913616
SELECT
[pvs].[VerificationRequestId]
, [vr].[GuestJourneyCompletedDate]
, [vr].[ExpiryDate]
, [vr].[PaymentValidationSetId]
, [pvs].[PaymentValidationSetId]
FROM
(
SELECT [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId]
FROM
(
SELECT
[vr].[Id] AS [VerificationRequestId]
, COALESCE(
[vr].[OverridePaymentValidationSetId],
[a].[PaymentValidationSetId],
[pvs_a].[Id],
[pvs_d].[Id]
) AS [PaymentValidationSetId]
FROM [dbo].[VerificationRequest] [vr]
-- Listing Override
LEFT JOIN [dbo].[Booking] [b] ON [b].[VerificationRequestId] = [vr].[Id]
LEFT JOIN [dbo].[Accommodation] [a] ON [a].[AccommodationId] = [b].[AccommodationId]
-- Account Override
LEFT JOIN [dbo].[PaymentValidationSet] [pvs_a] ON [pvs_a].[SuperhogUserId] = [vr].[CreatedByUserId] AND [pvs_a].[IsCustom] = 0 AND [pvs_a].[IsActive] = 1
-- Default
LEFT JOIN [dbo].[PaymentValidationSet] [pvs_d] ON [pvs_d].[SuperhogUserId] IS NULL AND [pvs_d].[IsCustom] = 0 AND [pvs_d].[IsActive] = 1
) [pvs]
GROUP BY [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId]
) [pvs]
LEFT JOIN [dbo].[VerificationRequest] [vr] ON [vr].[Id] = [pvs].[VerificationRequestId]
LEFT JOIN [dbo].[user] [u] ON [u].[Id] = [vr].[SuperhogUserId]
LEFT JOIN [dbo].[Country] [co] ON [co].[Id] = [u].[BillingCountryId]
LEFT JOIN [dbo].[Currency] [cu] ON [cu].[Id] = [co].[PreferredCurrencyId]
LEFT JOIN [dbo].[PaymentValidationSetToCurrency] [pvstc] ON [pvstc].[PaymentValidationSetId] = [pvs].[PaymentValidationSetId] AND [pvstc].[CurrencyIso] = [cu].[IsoCode]
--WHERE [VerificationRequestId] = 913616
WHERE [vr].[GuestJourneyCompletedDate] IS NULL
and [vr].[PaymentValidationSetId] IS NULL
and [vr].[ExpiryDate] >= GETDATE()
--and [VerificationRequestId] = 913616
ROLLBACK TRAN
--COMMIT TRAN
```

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,60 @@
# 20241112 Retro
## 🙌 What went well
- **Incident Mgmt**
- Outlier tests in Main KPIs work surprisingly well
- Problem detection and resolutions
- We keep on spearheading incident management and doing things right
- Incidents are finally hurting enough for TMT to pay (some) attention
- **Deliveries**
- KPIs refactor, including daily modelisation +1
- Integration of Hubspot into DWH
- Account Managers report (prev. Top Losers) being extremely useful and used by RevOps teams
- Churn rate metrics computation
- Starting Guest KPIs in new KPI modelisation
- GUEST TAXES CROSSCHECK FINISHED (AT LAST)
- Last Comilona was AMAZING
- Domain Analysts advancing well +1
- GJ A/B test alignment sessions
- Guest squad is doing the Lords work
- Not impacted by layoffs
## 🌱 What needs improvement
- **Incidents**
- Persistent bugs in BookingToProductBundle
- Old invoicing incident
- + generally a lot of incidents all over the place
- **People**
- Layoffs communication sourness
- Lou D. leaving us
- **Priorities and planning**
- Tons of unplanned work - delaying other deliverables for Q4 +2
- Q1 company priorities still mostly focus on deliver new stuff rather than fixing core business
- + general misalignment between TMT and boots on the ground
- Data Engineer vacancy not filled by now clearly impacting Q1 +2
- Some Data Requests do not reach the channel, needs investigation
- General doomloop sourness around New Dash, with no light at end of tunnel
## 💡 Ideas for what to do differently
- New Dash retrospective with PMs/Dash Squad/Data by the EOY
- CI/CD checks on DWH complete PR button to ensure branch is up-to-date with master branch
- Modify data captain distribution
- Include Tech Team in data alerts channel and tag them
- Propose and discuss how to align with Tech team to avoid context switching and optimise time and effort
## ✔ Action items
- [ ] If Data Engineer vacancy doesnt progress by end of september, pursue sign-off on consequences
- [ ] Reassess DE plans
- [ ] Discuss and agree with Tech team on data-alerts onboarding (should they be there? who should we tag?)
- [ ] Think about how to make some kind of “PBI Homepage” where Superhog personnel can find all the PBIs that are available easily
- [ ] Document all the config references (URLs, DB connection strings, credentials, etc)
- [ ] Agree with Ben R. on a different way to manage permissions PBI
- [ ] Potentially, also include CI checks in dbt repo
- [ ] Make a cleaning day for Data Catalogue docs
- [ ] Document existing invoicing processes, not just new ones
- [ ] Azure DevOps checks on DWH complete PR button to ensure branch is up-to-date with master branch
- [ ] Discuss with Ben C. New Dash retrospective with PMs/Dash Squad/Data by the EOY

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,77 @@
# 20241119-01 - CheckIn Cover multi-price problem (again)
# CheckIn Cover multi-price problem (again)
Managed by: Pablo
## Summary
- Components involved: SQL Server, DWH, superhog-mono-app codebase
- Started at: 2024-11-18 12:49:06 UTC
- Detected at: 2024-11-19 06:34:16 UTC
- Mitigated at: 2024-11-19 17:30:00 UTC
A new stored procedure released on 2024-11-18 mistakenly added records in `live.dbo.PaymentValidationSetToCurrency` with values `0` for CIH prices and covers. This caused a dimensionality issue in the DWH, which lead to duplicate records in DWH and bogus reporting for CIH in PBI, with inflated sales numbers and other affected data points. Besides that, a seeding script from the application that doesnt respect the `UpdatedDate` column of `live.dbo.PaymentValidationSetToCurrency` caused data drift between SQL Server and DWH, which increased investigation complexity and generated the need for backfills.
This incident is a very close reoccurrence of this one from June: [20240619-01 - CheckIn Cover multi-price problem](20240619-01%20-%20CheckIn%20Cover%20multi-price%20problem%20fabd174c34324292963ea52bb921203f.md). The underlying design mistakes that act as a root cause are common across both incidents.
## Impact
CIH reporting in the DWH has been displaying incorrect figures for 11 hours. This includes data such as revenue totals, sales counts, funnel and conversion rates metrics, and individual sales records displaying wrong prices.
## Timeline
All times are UTC.
| Time | Event |
| --- | --- |
| Sometime before 2024-11-18 12:49:06 | A release was made on the Superhog backend, which added the migration `202411121235595_CreateCustomBundle.cs` |
| 2024-11-18 12:49:06 | Faulty records with `0` value for CIH price and cover got added to `live.dbo.PaymentValidationSetToCurrency`. We suspect they were added by the stored procedure `CreateCustomBundle`. |
| 2024-11-18 13:00:10 | One of the hourly Airbyte jobs that syncs between SQL Server and the DWH caught the faulty records and copied them over into the DWH. |
| At some unknown time between 2024-11-18 13:00:10 and 2024-11-19 06:15:00 | The seeding script for CIH prices and covers runs in SQL Server, overriding the faulty records with `0` values. |
| 2024-11-19 06:15:00 | A `dbt run` was triggered, propagating the faulty records in downstream models and breaking the granularity of some models with duplicate record. From this point on, data in the DWH and the reading PBI reports was wrong. |
| 2024-11-19 06:34:16 | A data test was triggered due to duplicate records in `reporting.core__vr_checkin_cover` breaking the PK. Data team starts investigating. |
| 2024-11-19 14:00:00 | Pablo realises the issue looks like a duplicate of [20240619-01 - CheckIn Cover multi-price problem](20240619-01%20-%20CheckIn%20Cover%20multi-price%20problem%20fabd174c34324292963ea52bb921203f.md). This drives him to quickly spot and confirm the data drift and the faulty records. |
| 2024-11-19 15:30:00 | Pablo discusses with Lawrence and the root cause of the issue is identified. |
| 2024-11-19 17:30:00 | An Airbyte + dbt backfill to fix the data drift and remove the faulty records finishes. From this point on, data in the DWH and PBI is correct again. |
| | Incident mitigated. |
## Root Cause(s)
The root cause is a combination of the following:
- The true, core root cause is that business logic for CIH across the company assumes that CIH has a single, global price across all Superhog for each currency. Despite this, the database actually allows for different prices per platform user. This design is not fit for our business logic and allows for incidents like this to happen. Should this be redesigned to properly reflect our business logic, neither this incident nor [20240619-01 - CheckIn Cover multi-price problem](20240619-01%20-%20CheckIn%20Cover%20multi-price%20problem%20fabd174c34324292963ea52bb921203f.md) would have happened.
- In the case of this incident, the trigger of the issue was that the uniqueness of price values per currency in `live.dbo.PaymentValidationSetToCurrency` was not respected by the stored procedure `202411121235595_CreateCustomBundle.cs`, which set the values for CIH prices and covers of some accounts to `0`.
- This cascaded into breaking the uniqueness of the primary key of table `dwh.intermediate.int_core__check_in_cover_prices` in the DWH, which led to duplicate records in downstream tables related to CIH, and to wrong data being displayed in PBI reports.
- Besides that, a seeding script that updates CIH price and cover values ran on top of `live.dbo.PaymentValidationSetToCurrency`, overriding prices without respecting the `UpdatedDate` column. This caused data drift across the DWH and SQL Server.
## Resolution and recovery
The short mitigation consisted on:
- The wrong, `0` valued records in `live.dbo.PaymentValidationSetToCurrency` where accidentally reverted back to their proper prices.
- Performing a backfill of the table `PaymentValidationSetToCurrency` on Airbyte so that the `sync` layer table would stop having duplicated prices.
- Execute a `dbt run` on the DWH to propagate the fixed data.
## **Lessons Learned**
- What went well
- Automated data alerts in DWH helped us notice the incident fast.
- The post-mortem from the previous incident accelerated a lot investigation and resolution. It made it easy to understand what was happening and fix it, even though the incident is rather tricky as it has many moving parts.
- What went badly
- Our inadequate design for the CIH logic in the backend keeps biting us back.
- The complexity and shared boundaries across squads are causing us to step on each others toes (a change made by the new dash squad changes behaviours on the domain of the guest squad in an uncontrolled way).
- We didnt take action from the stuff we learned from the previous incident of this type back in June, and so the issues keep on appearing.
- Where did we get lucky
- The CIH prices seeding script fixed the wrong values inserted by the new migration added by the Dash Squad. We removed the wrong values due to sheer luck.
## Action Items
- [ ] Fix the stored procedure `CreateCustomBundle` defined in the migration `202411121235595_CreateCustomBundle.cs` so that it stops creating `PaymentValidationSetToCurrency` records with prices different that the canonical ones.
- The exact lines that cause the issue [can be found here](https://guardhog.visualstudio.com/Superhog/_git/superhog-mono-app?path=/Guardhog.Data/StoredProcedures/CreateCustomBundle/202411121235595_CreateCustomBundle.cs&version=GBdevelop&line=170&lineEnd=171&lineStartColumn=4&lineEndColumn=29&lineStyle=plain&_a=contents)
- [ ] Modify the CIH prices seeding script so that it respects the `UpdatedDate` column, preventing future data drifts.
- [x] Add more specific data tests in the DWH to spot this issue faster (we can add a test that is still not there and that would give away that this issue is happening instantly)
## Appendix
Link to the previous occurrence of this issue: [20240619-01 - CheckIn Cover multi-price problem](20240619-01%20-%20CheckIn%20Cover%20multi-price%20problem%20fabd174c34324292963ea52bb921203f.md)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,67 @@
# 20241210 Retro
## 🙌 What went well
- **Delivery**
- Good advance with KPIs reporting
- A/A test showed there were some improvements needed in the tracking and Guest Squad quickly fixed it
- Check-in hero & S&P integrations to DWH went smoothly
- Migration of Athena CosmosDB container went smoothly
- You guys (J&U) have been rocking it
- **Methodology**
- Quiet days are being helpful. More?
- Documentation keeps helping us a lot
- dbt docs working perfectly
- Anaxi and Infra documentation is absolutely perfect for dumb analysts like Uri to handle actual DE work effectively
- Tagging unplanned work in a dedicated epic to wrap up EOQ
- Very good collaboration with APIs and Guest Squads
- **Fun**
- Cube and rollup workshop in love
- EOY social activities (Quiz, Escape Room, Dinner)
- Glad to see some upcoming python usage
- And notebooks are very cool (used properly)
- Domain analysts program +1
## 🌱 What needs improvement
- **Methodology**
- Huge amounts of unplanned work - not being able to fully reach Q4 objectives
- Very reactive quarter: spinning from one fire to the next
- **Keeping things tidy**
- Going crazy with PBI permissions: invisible tangle of who has access where
- Having a better way to follow up on the usage of reports to see which ones are relevant
- We are not doing great with cleaning up old stuff, tends to stick around permanently
- Tech change management from MS Server changes side
- New Dash reporting is still facing data quality issues from the source and its not prioritised
- **Cachondeito**
- Moar comilonas.
- Kind of miss the office (an office and the bars, not Norssken specially)
## 💡 Ideas for what to do differently
- Change planning and organization to be a tad less reactive, have more time to tidy things up? Shape up? Change Data Captain role?
- Kill SH legacy reporting to avoid confusion on KPIs
- More syncs with RevOps (we do tons with Product, a bit with Finance, RevOps is the long forgotten son)
- Put some order in our Notion
- Make retros 2h
## ✔ Action items
- [x] If Data Engineer vacancy doesnt progress by end of september, pursue sign-off on consequences
- [x] Reassess DE plans
- [ ] Retro with Ben C. around planning practices (centralization is not working, changing scopes too fast, etc).
- [ ] Quiet Tuesdays and Quiet Thursdays
- [ ] Move calendar recurring meetings
- [ ] Give Ben C. a heads-up
- [ ] Read Shape-up ([https://basecamp.com/shapeup/](https://basecamp.com/shapeup/)) and discuss next retro
- [ ] Fuse Comilona and Retro and schedule for Monday 13/01 and make retros loooonger (2H)
- [ ] Sketch roughly formalization of Domain Analysts programme
- [ ] Discuss and agree with Tech team on data-alerts onboarding (should they be there? who should we tag?)
- [ ] Think about how to make some kind of “PBI Homepage” where Superhog personnel can find all the PBIs that are available easily
- [ ] Document all the config references (URLs, DB connection strings, credentials, etc)
- [ ] Agree with Ben R. on a different way to manage permissions PBI
- [ ] Potentially, also include CI checks in dbt repo
- [ ] Make a cleaning day for Data Catalogue docs
- [ ] Document existing invoicing processes, not just new ones
- [ ] Azure DevOps checks on DWH complete PR button to ensure branch is up-to-date with master branch
- [ ] Discuss with Ben C. New Dash retrospective with PMs/Dash Squad/Data by the EOY

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,79 @@
# 20241211-01 - DWH scheduled execution has not been launched
# DWH scheduled execution has not been launched
Managed by: Uri and Pablo
## Summary
- Components involved: Airbyte VM, Airbyte, dbt, xexe, anaxi, DWH
- Started at: 2024-12-11 05:00:00 UTC
- Detected at: 2024-12-11 07:41:00 UTC
- Mitigated at: 2024-12-11 09:48:00 UTC
An out of the ordinary resource consumption by Airbyte has left the Airbyte VM knocked down for 5 hours due to lack of memory. Jobs that run on that machine by various of our data platform components didnt run. We rebooted the machine and re-run all pending work.
## Impact
The nightly loading and refreshing of data in DWH has been delayed by about 5 hours. This means reporting was stale for business users for around 3 hours during working time (assuming nobody looks at PBI reports at 6AM. Maybe Joan?).
## Timeline
All times are UTC.
| Time | Event |
| --- | --- |
| 2024-12-11 04:00:00 | The Airbyte VM jumps from having 1.5GB of free memory to almost none (~50 MB). CPU usage also picks up from ~0% to ~50% and stays stuck there.
A sync job for the stream SQL Server incremental to DWH starts (job ID: 19235), but communication with the worker container is lost at 04:01:16.
A sync job for the stream SQL Server full refresh to DWH starts (job ID: 19237), but communication with the worker container is lost at 04:01:16.
A sync job for the stream Stripe UK to DWH starts (job ID: 19236), but communication with the worker container is lost at 04:01:06. |
| 2024-12-11 04:01:17 | Airbyte jobs scheduled to begin from this point in time onwards do not start due to lack of resources.
All cron jobs on the machine after this point in time do not start due to lack of resources. This includes dbt, anaxi and xexe jobs. |
| 2024-12-11 07:41:00 | Uri notices Main KPIs are not updated with 10th December data. After checking Data Alerts, no alert has been raised. After checking Data Receipts, Uri confirms that the expected scheduled run has not been executed. |
| 2024-12-11 07:43:00 | A message in the Data channel is sent to notify users of an ongoing incident. |
| 2024-12-11 07:50:00 | Uri tries to connect to the SH Data Airbyte machine unsuccessfully. |
| 2024-12-11 07:56:00 | Something happened around 4AM UTC since Airbyte resource consumption has fallen to a minimum and stagnated. Checking the behavior on previous days, this looks out of the ordinary. |
| 2024-12-11 08:07:00 | Looks like its a networking issue, but cannot be 100% sure. Uri suggests restarting Airbyte machine, but might not be the best approach. Waiting for Pablo since hes the expert. |
| 2024-12-11 08:30:00 | Pablo comes in and looks at the situation. He identifies the lack of available RAM memory in the Airbyte VM and assumes that Airbyte has consumed all available resources and locked the VM in doing so. |
| 2024-12-11 08:47:00 | Pablo triggers a reboot of the Airbyte VM, which completes successfully in a couple of minutes. Memory gets freed as part of it and the VM and container services become reactive once again. |
| 2024-12-11 08:49:00 | Multiple Airbyte jobs start again to catch up with the missed runs. |
| 2024-12-11 09:36:00 | Pablo starts triggering missed xexe, anaxi and dbt jobs. |
| 2024-12-11 10:25:00 | All due jobs are completed and the DWH state is up to date. |
| | End of mitigation |
## Root Cause(s)
Multiple Airbyte jobs got triggered to run at 04:00:00 UTC. It seems the workload produced by the data volume on 2024-11-12 was enough to chew all RAM in the VM and bring it to a deadlocked state. This chained into all jobs running on the VM (Airbyte, dbt, anaxi and xexe) not working until mitigation was put in place.
## Resolution and recovery
We brought things back to normal by rebooting the Airbyte VM so that the machine would stop being deadlocked.
Some pending jobs started themselves. Others were triggered manually.
## **Lessons Learned**
- What went well
- Azure dashboards allowed us to identify the resource bottleneck easily.
- Team is alert and notices fishy behaviours fast, even when there are no alerts.
- Our logs allowed to understand nicely what ran and what didnt.
- What went badly
- We almost forgot to re-run xexe and anaxi job.
- We had no alerts. We arent testing for freshness the right away, so DWH can go stale without warning.
- Where did we get lucky
- We got lucky in this not happening before.
## Action Items
- [x] Change current schedules in Airbyte to avoid the 04:00 AM memory usage peak.
- Stripe UK and Superhog full refresh have been shifted by a few minutes (25 and 35 later than current schedule.)
- [ ] Discuss the implementation of dbt source freshness tests.
- [ ] Research ways to prevent Airbyte from sucking up all available memory, or at least notify when it happens.
## Appendix
-

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,49 @@
# 20241217 - Long-term Data topics with Rich
We sit to discuss on 2024-12-17.
# Recap on what the Data team does
What we should do:
- Supervise data across the org: catalogue all the data and data products we have
- Build and own company wide Data infrastructure
- Build and own stable, company wide reports, dashboards, etc.
- Provide brain power for complex analysis
- Build Data Literacy across the company
What we shouldnt do:
- Own absolutely every little report that exists in Superhog (and become a bottleneck in doing so)
- Act as a poorly designed patch/release-valve on other product shortcomings
- Miraculously overcome lack of data and poor-quality data
# Recap on our capabilities
[Check this whiteboard](https://guardhog-my.sharepoint.com/:wb:/g/personal/pablo_martin_superhog_com/ERy6GpPt0S9Ht_vhl6jF8EsBWE9fzDX8DIv7vQv3whNo7A?e=pHw8C9).
# Long-term topics
Topics:
- Grow team to sufficient size to increase bus factor
- Implant analyst roles within business functions
- Long-term capabilities that we lack:
- Enable application runtimes to access DWH data
- Embedding data products within customer-facing applications
- Advanced orchestration of workloads
- Automated, one off, file based reports to internal and external users
- Full end-to-end lineage from data sources to data products + tracking of data products usage
- Improve relationship with stakeholders:
- Mature data contracts approach with upstream teams
- Mature tracking and communication with end-consumers
- Improve priorities setting with the business
- Keep on improving analyst experience to maximize productivity/avoid going into maintenance hell
- Transfer out accumulated responsibilities that make no sense (invoicing and other shadow-product-engineering areas)
Stuff to do to achieve the above:
- Hire a DE
- Migrate from PBI into a better tool
- Carefully add new tooling to the Data Platform
- Build SOP with other teams for upstream/downstream relationships

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,32 @@
# 20241218 - Ways of working with Matt
We (Matt, Uri, Pablo) sit down to discuss how to transition after Ben C.s departure.
Topics:
- Our challenges
- Prioritising across areas, keeping up with initiatives from other teams
- Balancing planning and doing
- Balancing maintenance, adhoc and long-term work
- Making sure maintenance is visible and timely
- Team splits
- Engineering vs Analysts
- Leads vs ICs
- Planning:
- Quarterly TMT planning
- Quarterly tech meeting
- Biweekly planning
- Daily
- Retrospectives
- Quarterly retrospectives with you?
- People mgmt topics
- Performance reviews and career planning
- Holiday planning
---
Stuff we discussed agreed:
- Keep biweekly planning, invite stakeholders as needed when they need to come in.
- Tip the balance of long-term vs adhoc more towards adhoc
- Set weekly meetings with Matt (alternate planning with simple people catchups)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,22 @@
# 2024Q3 Data <> Tech Meeting
**Agenda:**
- How are you guys doing?
- Heads-up: we want a new Data Engineer
- Documentation
- Knowledge on data models and business context around them is key for Data execution
- We are currently struggling with this and we feel you as well
- Time soaking will only get worse as teams grow (N-N comms)
- How can we help improve this? (PS: we are already educating and insisting business and PMs on the importance of this)
- Integrations and Dependencies (SQL Server and Cosmos DB)
- Its been a good Q, thanks for that
- Looking forward to keeping it that way
- Current setup is very informal and lean… but it works, so lets keep it simple as long as we can
- A/B testing
- We want to give it a first shot during Q4 in the Guest Journey
- Very valuable long term but the capacity will need to be built over time
- Work with squads will have to be very tight
- Guest journey is the focus for now, but New Dash could be part of it eventually
- Data Quality capabilities
- We are progressing, some capabilities might be useful for you

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,7 @@
# 2024Q3
[Q3 Data Achievements ](Q3%20Data%20Achievements%201130446ff9c9800e84e4f03750b752a1.md)
[Q3 OKRs drafting](Q3%20OKRs%20drafting%2033c62b60320849acbb01925a01f7a383.md)
[2024Q3 Data <> Tech Meeting](2024Q3%20Data%20Tech%20Meeting%209f3da234200443028fb178c882ceaf7d.md)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,7 @@
# 2024Q4
[Q4 Data Scopes proposal](Q4%20Data%20Scopes%20proposal%2075bf38ab8092471d910840ab86b0ec60.md)
[Q4 Data Achievements](Q4%20Data%20Achievements%201570446ff9c980b0a094ccfc9533bee4.md)
[2024Q4 Data <> Tech Meeting](2024Q4%20Data%20Tech%20Meeting%2017a0446ff9c9802da22be93fea285cc4.md)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,15 @@
# 2024Q4 Data <> Tech Meeting
**Agenda:**
- How are you guys doing?
- Heads-up: Pablos pat. leave
- Team is going to be limited, engineering wise
- Uri might need support at times
- Data contracts & Dependency management
- We are becoming blockers more and more often
- We feel we need to explore ways to improve this before we hit deadlocking
- Should we improve our comms around data alerts? Should we share ownership more?
- A/B testing retro
- FX Rates are now shared and available for you
- [Evidence.dev](http://Evidence.dev), is it of your interest?

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,32 @@
# 2025-01-22 - Data Planning
### Done
- RevOps - Active PMS in New Dashboard Reporting
- KPIs - Invoiced Revenue refactor (+ data is now cut to April 2022)
- KPIs - New Dash Invoiced Revenue now available
- KPIs - New Onboarding MRR metric is available in Main KPIs
- Finance - Guesty API fees changed on November 2024, report updated
- Product - Finalised Guest Journey A/B test analysis with very good results
- Rebranding - Hubspot integration to DWH remained unaffected
### In Progress
- Bugfix - Guest KPIs reporting has a strange connectivity issue
- Finance - Check in Hero API reporting for invoicing purposes
- Finance - Screen and Protect API reporting for invoicing purposes (currently stopped since theres no clients)
- Finance - Accounting aggregations reporting → This goes in line with improving Revenue accuracy in KPIs
- Other - Excel tips and best practices documentation (low prio)
- Other - Discontinue Superhog Reporting (legacy Power BI)
### To Do (does not include critical subjects discussed last week)
- Other - Understand Booking Fees / Cancelled Bookings decay in Dec 2024
- KPIs/RevOps (Chloe) - Track Revenue Retained ratios in Main KPIs for graphical display over time
- KPIs - Propose Billable Booking KPI definition for New Dash: 1 Booking can have multiple services invoiced in different times, how do we attribute them?
- Product/RevOps - Include a user adoption funnel per service for New Dash, to identify adoption/upsell possibilities
- Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it, if recent created bookings per account is decreasing, display last time account was contacted. This could be potentially the first step towards a “Account Manager KPIs” for AM teams
- RevOps (Kayla) - Churn tracking: see if we can automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc)
- KPIs - Rework Revenue display in Main KPIs
- KPIs - OKR, target-based reporting

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,43 @@
# 2025-01-29 - Data Planning
### General Updates
- Start weekly syncs with Guy / Finance on KPIs - Suzannah
- Engineering - Ingestion of backend data to billing db
- Data POV: Its supposed to be carried out on Engineering side.
- Engineering POV: Its supposed to be carried out in Data side.
- Pending discussion to clarify Uri x Ben on Wed 29th
- Uris POV: This is a Engineering architectural decision and implementation. This is a big no-no on our side, specially if Pablo is not here. It would take me several days, for a not robust implementation, for a critical project as invoicing - and we already had some issues on this regard. Might need your support on this.
### Done
- January invoicing incident resolution - [Incident Report](20250124-01%20-%20Booking%20invoicing%20incident%201880446ff9c9803fb830f8de24d97ebb.md)
- Bugfix - Guest KPIs reporting has a strange connectivity issue - [Incident Report](20250122-01%20-%20Power%20BI%20Main%20Guest%20KPIs%20Bug%201840446ff9c980249355f34c58c4686e.md)
- Finance - Check in Hero API reporting for invoicing purposes - [Link](https://app.powerbi.com/groups/me/apps/043c0aec-20b8-4318-9751-f7164b3634ad/reports/ca328a93-8d9d-431c-ac01-c646c81ba421/285e358d70a0c9155b23?experience=power-bi)
- KPIs/RevOps (Chloe) - Track Revenue Retained ratios in Main KPIs for graphical display over time
- Guesty Invoicing - Data quality issues misunderstanding for invoicing on Finance/APIs side
- Other - Fix Data Request workflow
- Guest Squad - A/B test mess fixed + retrospect upon - [Post mortem here](https://www.notion.so/Confusion-over-Fixed-vs-Relative-on-A-B-test-results-Incident-report-1850446ff9c9804f9fd7e004ed47d095?pvs=21)
- Data - Fixed alerts that failed on Jan 29th
### In Progress
- Guesty - Resolutions payouts analysis
- KPIs - Rework Onboarding MRR (avg per client + actual revenue expected)
- Finance - Accounting aggregations reporting → This goes in line with improving Revenue accuracy in KPIs
- Other - Understand Booking Fees / Cancelled Bookings decay in Dec 2024
- KPIs - Propose Billable Booking KPI definition for New Dash: 1 Booking can have multiple services invoiced in different times, how do we attribute them?
- Finance - Screen and Protect API reporting for invoicing purposes (currently stopped since theres no clients)
- Other - Discontinue Superhog Reporting (legacy Power BI)
- Other - Excel tips and best practices documentation (low prio)
### To Do (does not include critical subjects)
- KPIs - Implement Billable Booking KPI for new dash after definition
- KPIs - Rework Revenue display in Main KPIs
- KPIs - OKR, target-based reporting
- KPIs - New Dash vs. Old Dash category
- Product/RevOps - Include a user adoption funnel per service for New Dash, to identify adoption/upsell possibilities
- Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it, if recent created bookings per account is decreasing, display last time account was contacted. This could be potentially the first step towards a “Account Manager KPIs” for AM teams
- RevOps (Kayla) - Churn tracking: see if we can automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,53 @@
# 2025-02-05 - Data Planning
### General Updates
- Started weekly syncs with Guy / Finance on KPIs - Suzannah
- Engineering - Ingestion of backend data to billing db
- The proper solution has been delayed until Pablo is back to decide. In the meantime, one-short inputs carried out by Tech
- Other - Discontinue Superhog Reporting (legacy Power BI)
- Apparently it was more widely used than expected
- The effort has shifted towards re-implementing the necessary bits (Listings, Bookings, Payments) but reading from DWH so Data has full control
- On Waiver Payouts and Resolutions Payouts - Revenue Retained Post-Resolutions trends
![image.png](image%205.png)
- I wonder if we should invest / start a working line on:
- Resolutions claims data (still pending integration, no news)
- Understanding client price plans / programs for upsell or detect edge cases
### Done
- Finance - Run Old Dash invoicing exports for January 2025
- Guesty - Resolutions payouts analysis
- KPIs - Invoiced data is now available on the 20th of the month for the previous month
- KPIs - Rework Onboarding MRR (avg per client + actual revenue expected)
- Finance - Accounting aggregations reporting (including client MoM comparison to spot incidents) - [Report here](https://app.powerbi.com/groups/me/apps/4a019abb-880f-4184-adc9-440ebd950e00/reports/9d97fb1e-505e-4592-8a37-d28526a93f4c/7659e1cc0a39b8c3d5cd?experience=power-bi)
- Other - Understand Booking Fees / Cancelled Bookings - [First analysis completed](https://www.notion.so/2025-02-04-Booking-Fees-per-Billable-Booking-Decrease-1840446ff9c980588958c56a8b600d47?pvs=21)
- Other - Help Leo on potential future verification invoicing for historical client Operto
- Other - Investigated issues raised on New Dash reporting with Gus
- Other - Several small requests, mostly from Finance. Re-insisted on using Data Requests workflow to avoid constant context switching
### In Progress
- Other - Re-implement Superhog Reporting reading from DWH
- KPIs - Propose Billable Booking KPI definition for New Dash: 1 Booking can have multiple services invoiced in different times, how do we attribute them?
- KPIs/New Dash/AM reporting - Exclude known test accounts to increase data quality (wont remove all of them)
- Product/RevOps - Include a user adoption funnel per service for New Dash, to identify adoption/upsell possibilities
### Stopped / No advancement
- Finance - Screen and Protect API reporting for invoicing purposes (currently stopped since theres no clients)
- Other - Excel tips and best practices documentation (no advancements, low prio)
### To Do (does not include critical subjects)
- KPIs - New Dash vs. Old Dash category
- Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price + Deal id
- KPIs - Implement Billable Booking KPI for new dash after definition
- KPIs - Rework Revenue display in Main KPIs
- KPIs - OKR, target-based reporting
- KPIs - Rework Cancellation rates (attribute them to Check-out + ratio)
- KPIs - Payment Count and Average Amount per Payment Count (Waiver/Deposit Fee/etc)
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it, if recent created bookings per account is decreasing, display last time account was contacted. This could be potentially the first step towards a “Account Manager KPIs” for AM teams
- RevOps (Kayla) - Churn tracking: see if we can automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc)

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,41 @@
# 2025-02-12 - Data Planning
### General Updates
- Joaquin off on Monday 17th + Uri off on Friday 21st
### Done
- Other - Re-implement [Superhog Reporting](https://app.powerbi.com/groups/me/apps/86bd5a07-0cd9-40ab-9e97-71816e3467e8/reports/fe54c090-ae85-4cfd-9f28-3d31ab486bc3/dfc2fe95ee1672c1bbdc?experience=power-bi) reading from DWH
- KPIs - Propose Billable Booking KPI definition for New Dash: Agreed with Suzannah on 2 metric definition
- KPIs - Rework Cancellation rates (attribute them to Check-out + ratio)
- KPIs - [First draft of Main KPIs - Overview](https://app.powerbi.com/groups/me/apps/33e55130-3a65-4fe8-86f2-11979fb2258a/reports/5ceb1ad4-5b87-470b-806d-59ea0b8f2661/50c56def523c2003b054?experience=power-bi)
- Ad hoc requests - several completed on Finance/RevOps side
- Invoicing Incident finally closed after Post Mortem
### In Progress
- KPIs - New Dash vs. Old Dash (vs. API) category
- Product/RevOps - Include a user adoption funnel per service for New Dash, to identify adoption/upsell possibilities
- Guests - Start discussing on the implementation for Guest Products and Single Payment - Multi Service refactor. Likely work to start soon.
### Stopped / No advancement
- KPIs/New Dash/AM reporting - Exclude known test accounts to increase data quality (wont remove all of them) - Engineering to built it properly for us to exclude it properly
- Finance - Screen and Protect API reporting for invoicing purposes (currently stopped since theres no clients)
- Other - Excel tips and best practices documentation (no advancements, low prio)
### To Do (does not include critical subjects)
- KPIs - Implement Billable Booking KPI for new dash after definition
- KPIs - Rework Main KPIs overview (YTD+MTD)
- Might need creation of APIs KPIs (for Bookings mostly)
- Resolutions - Ingest Resolution Centre Data into DWH
- Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price + Deal id
- Resolutions - DWH modelling
- Resolutions - Reporting
- Invoicing Incident - Further automation improvements: Xero (for source of truth in actual invoiced amount) + Hubspot (for churn/onboarding/AM info) + Backend (for what we should have invoiced, New Dash mostly)
- KPIs - Payment Count and Average Amount per Payment Count (Waiver/Deposit Fee/etc)
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it, if recent created bookings per account is decreasing, display last time account was contacted. This could be potentially the first step towards a “Account Manager KPIs” for AM teams
- RevOps (Kayla) - Churn tracking: see if we can automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc)
- RevOps (Alex) - Client Cohorts: explore retention + key metrics to understand if its valuable for further client understanding

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,45 @@
# 2025-02-19 - Data Planning
### General Updates
- Uri off on Friday 21st (reminder)
### Done
- KPIs - New Dash vs. Old Dash (vs. API) category ([Main KPIs report](https://app.powerbi.com/groups/me/apps/33e55130-3a65-4fe8-86f2-11979fb2258a/reports/5ceb1ad4-5b87-470b-806d-59ea0b8f2661/cabe954bba6d285c576f?experience=power-bi))
- KPIs - Implement Live Deals ([Main KPIs report](https://app.powerbi.com/groups/me/apps/33e55130-3a65-4fe8-86f2-11979fb2258a/reports/5ceb1ad4-5b87-470b-806d-59ea0b8f2661/cabe954bba6d285c576f?experience=power-bi))
- KPIs - Implement Billable Booking KPI for new dash after definition ([Main KPIs report](https://app.powerbi.com/groups/me/apps/33e55130-3a65-4fe8-86f2-11979fb2258a/reports/5ceb1ad4-5b87-470b-806d-59ea0b8f2661/cabe954bba6d285c576f?experience=power-bi))
- Product/RevOps - Include a user adoption funnel per service for New Dash, to identify adoption/upsell possibilities ([New Dash - Offered Services report](https://app.powerbi.com/groups/me/apps/d6a99cb6-fad1-4e92-bce1-254dcff0d9a2/reports/44d8eee3-e1e6-474a-9626-868a5756ba83/99f744c80b91c605a7a1?ctid=862842df-2998-4826-bea9-b726bc01d3a7&experience=power-bi))
- Ad hoc requests, specially on Home Team Vacations Rentals (Kayla) → Risk of losing Booking fees after decrease from 10 USD to 6 USD
- Guests - Align for Illustrations A/B test launching next week
- Resolutions - Ingest Resolution Centre Data into DWH
- Data internal - Fixed CPU consumption
- Data internal - Fixed New Dash Reporting being down after release
### In Progress
- Finance - Screen and Protect API reporting for invoicing purposes
- Other - Excel tips and best practices documentation (reviewed) - How do you want to proceed? Share resources and/or schedule session? → Session + Resources
- KPIs - Main KPIs overview (YTD+MTD) - First partial delivery
### Stopped / No advancement
- KPIs/New Dash/AM reporting - Exclude known test accounts to increase data quality (wont remove all of them) - Engineering to built it properly for us to exclude it properly
- Guests - Start discussing on the implementation for Guest Products and Single Payment - Multi Service refactor. (No further news)
### To Do (does not include critical subjects)
- Report to support Pass The Keys migration to New Dash
- Resolutions - DWH modelling
- KPIs - Main KPIs overview (YTD+MTD) - Second delivery
- Creation of APIs KPIs (for Bookings mostly)
- Revenue Churn & MRR metrics
- Targets
- Guests - Adapt A/B test monitoring with specs of Illustrations A/B test
- Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price + Deal id
- Resolutions - Reporting
- Invoicing Incident - Further automation improvements: Xero (for source of truth in actual invoiced amount) + Hubspot (for churn/onboarding/AM info) + Backend (for what we should have invoiced, New Dash mostly)
- KPIs - Payment Count and Average Amount per Payment Count (Waiver/Deposit Fee/etc)
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it, if recent created bookings per account is decreasing, display last time account was contacted. This could be potentially the first step towards a “Account Manager KPIs” for AM teams
- RevOps (Kayla) - Churn tracking: see if we can automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc)
- RevOps (Alex) - Client Cohorts: explore retention + key metrics to understand if its valuable for further client understanding

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,43 @@
# 2025-02-26 - Data Planning
### Done
- Finance - Screen and Protect API reporting for invoicing purposes ([Report here](https://app.powerbi.com/groups/me/apps/043c0aec-20b8-4318-9751-f7164b3634ad/reports/96e5c7c2-d65a-4375-b706-61255498d7ae/c1f8e5bfc0385782a6b6?experience=power-bi). Waiting for Data Quality fixes from Ray side)
- Finance - Active PMS report now reading from DWH with small improvements ([Report here](https://app.powerbi.com/groups/me/apps/86bd5a07-0cd9-40ab-9e97-71816e3467e8/reports/244d6d40-5c0e-4c66-87b7-f040ca37bfd8/39b56ffe553b8d842f2e?experience=power-bi)).
- Other - Excel tips and best practices documentation ([Data Resources here](https://www.notion.so/Data-Resources-1520446ff9c98045b44bd670f7bf3605?pvs=21))
- Report to support Pass The Keys migration to New Dash (Improved [Payments - Details](https://app.powerbi.com/groups/me/apps/86bd5a07-0cd9-40ab-9e97-71816e3467e8/reports/992a437e-35c8-4aea-b908-5753655dc401/3376bcba0aa3617402da?experience=power-bi))
- Resolutions - DWH modelling
- KPIs - Main KPIs overview (YTD+MTD) - Second delivery
- Revenue Churn & MRR metrics
- Targets (first version)
- Cleaning & Data quality
- Remove unnecessary ID of Athena (Guesty) models
- Raise and fix issue on a client having thousands of GJ Created that were duplicated
- Tag if a New Dash Booking has a GJ
- Small fixes on S&P report aggregates
### In Progress
- KPIs - Main KPIs overview (YTD+MTD) - Second delivery
- Creation of APIs KPIs (for Bookings mostly)
- Targets Refinement
- OKRs & Business Strategy
- Guests - A/B test monitoring ongoing (launched 25th Feb)
- Resolutions - Reporting
### Stopped / No advancement
- KPIs/New Dash/AM reporting - Exclude known test accounts to increase data quality (wont remove all of them) - Engineering to built it properly for us to exclude it properly
- Guests - Start discussing on the implementation for Guest Products and Single Payment - Multi Service refactor. (No further news)
### To Do (does not include critical subjects)
- Excel training session (scheduled)
- Configure Guest Agreement service for New Dash after release (low effort)
- Resolutions - Automate manual tasks for Finance (waiting for specs)
- Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price + Deal id
- Invoicing Incident - Further automation improvements: Xero (for source of truth in actual invoiced amount) + Hubspot (for churn/onboarding/AM info) + Backend (for what we should have invoiced, New Dash mostly)
- KPIs - Payment Count and Average Amount per Payment Count (Waiver/Deposit Fee/etc)
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it, if recent created bookings per account is decreasing, display last time account was contacted. This could be potentially the first step towards a “Account Manager KPIs” for AM teams
- RevOps (Kayla) - Churn tracking: see if we can automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc)
- RevOps (Alex) - Client Cohorts: explore retention + key metrics to understand if its valuable for further client understanding

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,54 @@
# 2025-03-05 - Data Planning
### Done
- Incident - Verification Bulk Update (resolved) - Available [here](20250304-01%20-%20Verification%20Bulk%20Update%201ad0446ff9c9806faa8bf7673e7ed6a5.md)
- KPIs - Main KPIs overview (YTD+MTD) - Second delivery - Available [here](https://app.powerbi.com/groups/me/apps/33e55130-3a65-4fe8-86f2-11979fb2258a/reports/5ceb1ad4-5b87-470b-806d-59ea0b8f2661/43ac2f2995c23bfb4004?experience=power-bi)
- Targets for FY 2025 hidden to avoid misconceptions
- OKRs & Business Strategy
- Resolutions - New Resolutions reporting - Available [here](https://app.powerbi.com/groups/me/apps/fc6bf877-6175-413d-98e0-da8eb03d807e/reports/87641841-181e-4b10-933e-4ad1ed465607/9d2ccab758d14dfbc8ce?experience=power-bi)
- Power BI Truvi Rebranding
- [Main KPIs](https://app.powerbi.com/groups/me/apps/33e55130-3a65-4fe8-86f2-11979fb2258a/reports/5ceb1ad4-5b87-470b-806d-59ea0b8f2661/43ac2f2995c23bfb4004?experience=power-bi)
- [Truvi Reporting](https://app.powerbi.com/groups/me/apps/86bd5a07-0cd9-40ab-9e97-71816e3467e8/reports/fe54c090-ae85-4cfd-9f28-3d31ab486bc3/dfc2fe95ee1672c1bbdc?experience=power-bi) (previously Superhog reporting)
- New Dash - Configure Guest Agreement service for New Dash after release
- Finance - Ensure proper month attribution of Hyperline invoicing for invoiced revenue reporting purposes (affects Main KPIs, AMs Reporting, Accounting Reports)
- Data Captain requests as usual
- Guests - Start discussing on the implementation for Guest Products and Single Payment - Multi Service refactor.
### In Progress
- Power BI Truvi Rebranding - rest of reports
- Guests - A/B test monitoring ongoing (launched 25th Feb)
- Excel training session (scheduled)
### Stopped / No advancement
- KPIs/New Dash/AM reporting - Exclude known test accounts to increase data quality (wont remove all of them) - Engineering to built it properly for us to exclude it properly
### To Do (does not include critical subjects)
- Guest Products - Single Payment / Multi Service refactor - Starting Tuesday 11th March
- Finance/Resolutions - Automate manual tasks for Finance (waiting for specs)
- Re-visit targets for FY2026 (need Nathan input)→ Talk to Guy / Matt / Nathan - Meeting on Monday
- Update automation project backbone data
- Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price + Deal id
- Idea - Improvements on AM reports
- Rework Score so it captures projected revenue rather than last month revenue (i.e., gain ~1 month trend visibility). In other words - be able as of 1st of March to detect that February was bad for HTVR, instead of 1st of April as is now.
- Include Price Plans / Offering - New Dash vs. Old Dash
- Include API/New Dash/Old Dash category
- Include rank and share per Revenue Retained Post Resolutions
- Add a per Account Manager “welcome” page with the main indicators / alerts of the accounts per Account Manager.
- + potentially other ideas available below (Churn - Kayla, Cohorts - Alex)
- Idea - Resolutions KPIs
- Blocked - We need to capture all Resolutions to be able to build proper metrics
- Modelling KPIs data
- Main KPIs exposure
- Specific reporting
- Invoicing Incident - Further automation improvements: Xero (for source of truth in actual invoiced amount) + Hubspot (for churn/onboarding/AM info) + Backend (for what we should have invoiced, New Dash mostly)
- KPIs - Payment Count and Average Amount per Payment Count (Waiver/Deposit Fee/etc)
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it, if recent created bookings per account is decreasing, display last time account was contacted. This could be potentially the first step towards a “Account Manager KPIs” for AM teams
- RevOps (Kayla) - Churn tracking: see if we can automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc)
- RevOps (Alex) - Client Cohorts: explore retention + key metrics to understand if its valuable for further client understanding
- Idea - Guest Journey A/B test report
- Avoid manual runs on Data side
- Provide deeper level of detail

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,67 @@
# 2025-03-12 - Data Planning
## General Updates
- Performance review - how is it going to happen?
### Done
- Guests - Further discussion on the implementation for Guest Products and Single Payment - Multi Service refactor.
- RevOps - Churn tracking - automate manual monthly Churn reports and enhance it with other data (revenue last 12 m, etc) - Report available [here](https://app.powerbi.com/groups/me/apps/bb1a782f-cccc-4427-ab1a-efc207d49b62/reports/d4955aad-1550-46c7-9549-2bdeebb99286/3555842421d87b032c4e?experience=power-bi)
- RevOps - Churn prevention - display last time account was contacted
- RevOps - Account Management - Include API/Platform Deal category
- Bugfixes - Active PMS & New Dash offered services
- Other - Slides for TMT Data deep-dive
- Other - General support
- Support Finance on Invoicing subjects
- Follow up HTVR
- Follow up data drift due to user migration old dash → new dash - We loose history on Price Plans…
- Couple of data quality alerts
- Other Data Captain ad-hoc requests as usual
### In Progress
- TMT - Targets for FY2026
- Adapt based on Financial model almost ready - no stretch. Concern on Billable Bookings/Live Deals figure
- Data quality improvements in PBI
- Deal metrics to align with RevOps KPIs
- Platform Billable Bookings (est.) - Understand if theres better ways to track this figure
- We need to adapt the metric names on PBI to align with Finances naming. I agree, but it requires quite a bit of work.
- RevOps - Churn prevention - forecast created bookings per account to the end of the month and alert if theres a decrease. Booking projection needed for Revenue projection too.
- Product - Guest Journey New Illustrations A/B test monitoring ongoing (launched 25th Feb) - Looking very good
- Product - Guest Products - Single Payment / Multi Service refactor
- DWH modelling
- Old Dash Waiver Extracts for invoicing purposes
- Other - Excel training session (scheduled for today)
- Other - Power BI Truvi Rebranding - rest of reports (low prio)
### Stopped / No advancement
- KPIs/New Dash/AM reporting - Exclude known test accounts to increase data quality (wont remove all of them) - Engineering to built it properly for us to exclude it properly
- RevOps - Update automation project backbone data - Need further input
- Finance/Resolutions - Automate manual tasks for Finance (waiting for specs)
### To Do (does not include critical subjects)
- ~~Finance - Update legacy (old dash) invoicing exporter to show unit price and quantity besides total price + Deal id~~ Discussed with Nathan, no longer needed.
- RevOps (Kayla) - Churn prevention: alerting system if user had a PMS and no longer has it
- Idea - Improvements on AM reports
- Rework Score so it captures projected revenue rather than last month revenue (i.e., gain ~1 month trend visibility). In other words - be able as of 1st of March to detect that February was bad for HTVR, instead of 1st of April as is now.
- Add a per Account Manager “welcome” page with the main indicators / alerts of the accounts per Account Manager.
- Include Price Plans / Offering - New Dash vs. Old Dash
- Include rank and share per Revenue Retained Post Resolutions
- API KPIs
- We need at least bookings to compute Total Bookings (Platform Billable Bookings + API Bookings)
- Idea - Resolutions KPIs
- Blocked - We need to capture all Resolutions to be able to build proper metrics
- Modelling KPIs data
- Main KPIs exposure
- Specific reporting
- Invoicing Incident - Further automation improvements - Xero (for source of truth in actual invoiced amount) + Hubspot (for churn/onboarding/AM info) + Backend (for what we should have invoiced, New Dash mostly)
- KPIs - Payment Count and Average Amount per Payment Count (Waiver/Deposit Fee/etc)
- RevOps (Alex) - Client Cohorts - explore retention + key metrics to understand if its valuable for further client understanding
- RevOps (Alex) - RevOps KPIs - Automate RevOps KPIs sheet in PBI and add additional content (revenue, etc). Similar approach as for Churn.
- Product (Daga) - New Dash - Differentiate OSL/Manual/PMS bookings
- Idea - Guest Journey A/B test report
- Avoid manual runs on Data side
- Provide deeper level of detail

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,72 @@
# 2025-03-17 - Glad youre back, Daddy Pablo
Some things happened when you were not here, so heres a summary!
# Incidents
Important one:
[20250124-01 - Booking invoicing incident](20250124-01%20-%20Booking%20invoicing%20incident%201880446ff9c9803fb830f8de24d97ebb.md)
Other incidents:
[20250304-01 - Verification Bulk Update](20250304-01%20-%20Verification%20Bulk%20Update%201ad0446ff9c9806faa8bf7673e7ed6a5.md)
[20250122-01 - Power BI Main Guest KPIs Bug](20250122-01%20-%20Power%20BI%20Main%20Guest%20KPIs%20Bug%201840446ff9c980249355f34c58c4686e.md)
# Power BI main updates
- Main KPIs now have a proper KPI tracking (3 new tabs) and is able to compare vs. targets. Small overall improvements.
- Account Managers now has a Churn dedicated report. Small overall improvements.
- Accounting Reports now have a Finance aggregation (stated on a seeds), to show a monthly aggregation and per deal tracking. Xero data also contains Hyperline billing and we have these invoices/credit notes identified
- Truvi Reporting (previous SH reporting) is now fully reading from DWH.
- New Resolutions Report is available in a dedicated app.
- New Dashboard report now has an Offered Service tab. Small overall improvements. Be aware that the link to the app changed.
- API Reports now have the Screen and Protect invoicing report - still no clients.
- Truvi Rebranding is ongoing.
# Key Priorities
- Take an eye on the Data infra, despite nothing really extremely broke. My knowledge is limited on this regard but:
- We had the “issue” with CPU being around 100%. It happened again on the 12th of March, Xero integration in Airbyte was stuck but managed to manually fix it
- I received an email from Azure on VM stuff
- We did several data integrations from Cosmos (Resolutions) and new tables from the backend (Core), might be worth double checking
- Multi-service single payment & Guest Products - ongoing. Involves DWH (due to backend) and Old Invoicing (Stripe metadata) changes
- KPIs vs. targets for Financial Year 2026 + reporting (part of business strategy)
- Churn prevention - data-based alerting system. Good opportunity to leverage KPIs
- Data drifts and data quality - we need a better way to align with tech without us being blockers. Data contracts could be a possibility. Forcing full-refreshes on incremental models once every X days could be interesting.
- We need to align on how do we want to organise for Q2. A few thoughts:
- Aim for very few, key areas, really focusing on the must haves
- Aim for very few things that we want to do as a must have for Q2 internally in Data
- Allow plenty of space and capacity for flexibility - has worked very well in Q1
- Hyperline initiatives should be good for the moment on Data side, but were doing quite a bit of support to Finance considering user migration from old to new dash and so on. We might need to start pulling data from Hyperline at some point (but no immediate rush).
# Other stuff
- You might need to contact Cigna for medical insurance if you want to opt for it
- We will be having a Personal Development Review (PDR) by the end of March
- We should take a look at what improvement can be made to the data in HubSpot. Now working with Alex to sync our displayed deals lifecycle we have discovered a lot of issues with the data in HubSpot.
# Domain Analyst - Batch #2
Uri did a general prez for TMT and leads and theres quite a bit of interest on Domain Analysts.
Potential candidates:
- Chloe (Resolutions) - Proposed herself, motivated to learn
- Maha (Marketing) - Doing a Data Analytics course, makes tons of sense - to be discussed in depth
- Daga (Product) - Discussed in the past, would be good to have in-depth New Dash visibility
- Lisa (Finance) - Nathan explained the program, she is interested, would be good to regain a Domain Analyst in Finance
We didnt have a proper page explaining Domain Analyst program, so I created one. Feel free to challenge: [https://www.notion.so/truvi/Domain-Analyst-Program-1b60446ff9c980e58ab1fef0e3909085](https://www.notion.so/Domain-Analyst-Program-1b60446ff9c980e58ab1fef0e3909085?pvs=21)
# Fun things
- Comilona this Wednesday.
# Pablos own notes
- [ ] Ask about this: > You might need to contact Cigna for medical insurance if you want to opt for it
- [x] Review exporter code
- [ ] Review PBIs
- [ ] Review new seeds and schemas in dbt

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,26 @@
# 2025-03-19 - Data Planning
### Done
- RevOps - Churn prevention report improvements
- TMT - Targets for FY2026
- Adapt based on Financial model almost ready - no stretch.
- Data quality improvements in PBI
- Deal metrics to align with RevOps KPIs
- Other - Excel training session
### In Progress
- Finance - Align the metric names Data x Finance
- TMT - Targets + Simple report
- RevOps - Alerting! - forecast created bookings per account to the end of the month and alert if theres a decrease. Booking projection needed for Revenue projection too.
- Product - Guest Journey New Illustrations A/B test monitoring ongoing (launched 25th Feb)
- Product - Guest Products - Single Payment / Multi Service refactor
- Data - Automatic CI quality checks
### To Do (does not include critical subjects)
- New Dash KPIs - Service adoption & service revenue streams
- Adoption rates of each service
- Total Revenue + per Booking - to compare actual revenue (invoiced) vs. the chargeable vs. the discounts
- Decide what to do regarding Domain Analyst program

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,26 @@
# 2025-03-26 - Data Planning
### To discuss
- We automate or we dont
### Done
- TMT - Targets + Simple report
- Data - Automatic CI quality checks
- Support HTVR invoicing + autohost issue
- Follow up data quality issues on backend
### In Progress
- Migration Old Dash to New Dash - Update input
- New Dash - Service adoption
- RevOps - Alerting! - forecast created bookings per account to the end of the month and alert if theres a decrease. Booking projection needed for Revenue projection too.
- Product - Guest Journey New Illustrations A/B test monitoring ongoing (launched 25th Feb)
- Product - Guest Products - Single Payment / Multi Service refactor
### To Do (does not include critical subjects)
- New Dash - Revenue service streams + linked to Resolutions + protections = Booking P&L
- Finance - Align the metric names Data x Finance
- Decide what to do regarding Domain Analyst program

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,26 @@
# 2025-04-09 - Data Planning
### To discuss
- PDRs
### Done
- Updated file for Kayla on Old Dash to New Dash migration
- New Dash services adoption
- Account Performance report
- Many data alerts due to Backend bugs
- Internal:
- CI in DWH
- Refactor KPIs in DWH
### In Progress
- Domain Analysts
- Discussions / introductions scheduled with everyone
- Training being prepared
- Exploring tooling
- Flagging project
- Pending conversations with Tech and Resolutions
- Data quality issues management w. Ben
- Account alerting / more timely information

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,32 @@
# 2025-04-16 - Data Planning
### Done
- Flagging project
- Conversations with Tech and Resolutions
- First monitoring system implemented
- Domain Analysts
- Discussions / introductions with everyone
- Airbnb data request
- Data quality issues management w. Ben
- New board shared between Tech x Data to track and fix DQ issues
- Fix data quality issue on Revenue metrics
- Small improvements on many reports, ex: Churn Report now allows multi-month selection
- Illustration A/B test to be finished this week, no significant results
### In Progress
- Flagging project
- Pending conversation to present first results
- We need more data
- Ensure Old Dash Invoicing does not capture New Dash Bookings
- We have alignment with Tech & Finance, implementing changes
- Domain Analysts
- Training being prepared
- Exploring tooling
- Prepare launch of next A/B test - Welcome page visual changes
- Account alerting / more timely information
- Capture CIH API invoiced revenue in KPIs & Accounting reports
- Prepare for Account Managers to Customer Relations changes
- Account Manager will become obsolete → affects many reports
- Rely on new RRPR-based segmentation; discussion ongoing

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,29 @@
# 2025-04-30 - Data Planning
- Humphrey causing confusion with Screen & Protect price structure
### Done
- Screening and Protection relationship (well find a better name) project
- Requested input from Data team
- First preliminary analysis conducted
- Ensure Old Dash Invoicing does not capture New Dash Bookings
- Domain Analysts
- Lisa, Chloe, Maha and Daga
- Kick off conducted, programme started
- Currently doing the first Excel levelling assessment
- Capture CIH API invoiced revenue in KPIs & Accounting reports
- Host Resolutions appearing in both Bank Transactions and Credit Notes
### In Progress
- ~~Flagging~~ project
- Tracking performance continuously until we have more data
- Gathering feedback from customer facing colleagues
- Domain Analysts
- Training being prepared continuously
- Exploring tooling
- Launch of next A/B test - Welcome page visual changes
- Keeping an eye on performance
- Account alerting / more timely information
- New Report for Account Growth and Impact currently under internal review

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,41 @@
# 2025-05-07 - Data Planning
- Pablo off next week 12-16 May
- Joaquin off tomorrow 8 May, working remotely from Chile from 12-16. Off from 19-28 May
- Uri is around
### Done
- Data-Driven Risk Assessment (DDRA) project
- EDA on Resolution Incidents: [2025-05-02 Exploratory Data Analysis on Resolution Incidents](https://www.notion.so/2025-05-02-Exploratory-Data-Analysis-on-Resolution-Incidents-1e70446ff9c98043b263e3b2eadb79fb?pvs=21)
- Gathered feedback from business-facing teams
- Account alerting / more timely information
- New Report for Account Growth and Impact released. [Report available here](https://app.powerbi.com/groups/me/apps/bb1a782f-cccc-4427-ab1a-efc207d49b62/reports/3e1819f4-7069-49e1-8c6b-2e7527d596e3/ReportSectionddc493aece54c925670a?experience=power-bi)
- Previously existing report of Account Managers Overview will be eliminated on May 23rd
- Improvements on Host Resolutions Payments Report with new tab Account Rankings. [Report available here](https://app.powerbi.com/groups/me/apps/4a019abb-880f-4184-adc9-440ebd950e00/reports/86abbd2f-bfa5-4a51-adf5-4c7a3be9de07/7087b20b3e118306020e?experience=power-bi)
- Domain Analysts
- Daga, Maha, Chloe
- Excel levelling test completed, new assignment for next week
- We expect closure of Excel training by next week
- Lisa
- Excel training completed, starting SQL training
- Business as usual
- Data alerts follow up and fixes
- Data requests handling
### In Progress
- Stripe vs. Backend payments data discrepancies
- Early results ~3% missing payments in backend with respect to Stripe
- DDRA
- Continuous monitoring of New Dash Protected Bookings performance
- Start phase 2
- Domain Analysts
- Training being prepared continuously
- Exploring tooling
- Confident Stay (Guest Products + Single Payment / Multi Service
- Resuming work
- A/B test - Welcome page visual changes
- Keeping an eye on performance
- Data team internal
- Fixing DWH CI

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,46 @@
# 2025-05-14 - Data Planning
### Data Team Updates
- Pablo off, back on Monday
- Joaquin working remotely from Chile this week. Off from 19-28 May
- Uri is around
### Confident Stay
- Done:
- Sync with Guest Squad on latest state of Confident Stay
- Finished DWH internal refactor to accommodate for new incoming logic of Guest Products
- Next:
- Integrate Guest Products from the new flow into Guest Journey Payments (Check In Hero + Confident Stay)
- Continue DWH modelling on Guest Products
### Domain Analysts
- Chloe dropped the course
- Done:
- Daga + Maha: Excel training concluded successfully
- Next:
- Launching SQL training for Daga + Maha
- Well take both Daga + Maha and Lisa while Joaquin is off
- Continuously exploring tooling
### Business as usual
- Done:
- Data alerts follow up and fixes
- Data requests handling
### Stripe vs. Backend payments data discrepancies
- In hold until Pablo is back
- Early results ~3% missing payments in backend with respect to Stripe
### Data-Driven Risk Assessment (DDRA)
- Continuous monitoring of New Dash Protected Bookings performance
- Start phase 2 (if we have time)
### A/B test - Welcome page visual changes
- Continuous monitoring

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,66 @@
# 2025-05-21 - Data Planning
### Data Team Updates
- Joaquin off from 19-28 May
### On Billable Bookings emergency
- Done:
- Several analysis + support + gathering outputs:
- [Created Bookings evolution Old Dash → New Dash per account](https://www.notion.so/Created-Bookings-evolution-Old-Dash-New-Dash-per-account-1f50446ff9c9803ca922c2341bd714c2?pvs=21)
- New report: New Dash Onboarding
- To discuss:
- Our North Star: We are committed to delivering dependable screening and protection services, to build a profitable and **sustainable** business. **
- Were not far away from profitability, but this needs to go through the **sustainable** part.
- Billable Bookings investigation has raised several invoicing-related issues that are KEY to reach our goal. Invoicing clients correctly is core business. We have the feeling that this is not properly supported, as we spend more time discussing on new initiatives (ex: Pet Waiver) than fixing what needs to be fixed. Core business should work like a charm, and its clearly not the case.
- Decisions:
- Onboarding process being refined + also historical clients will be reviewed
- Invoicing
- New Dash: services being invoiced in different times, and its built wrongly. From 1st of June, all billing will be happening on verification start
### Screen and Protect API
- Discovered today that what the fees we report in PBI for the only client in S&P are wrong.
- S&P expects nightly fees. First client already is an exception and is charged per-booking.
- No one has contacted us about this, nor updated the logic on the documentation.
### Confident Stay
- Done:
- Integrate Guest Products from the new flow into Guest Journey Payments (Check In Hero + Confident Stay)
- Performance optimisations
- Next:
- Continue DWH modelling on Guest Products
- Ensure Check In Hero reporting makes it through the change
- Prepare confident stay reporting
### Data-Driven Risk Assessment (DDRA)
- Ongoing:
- Continuous monitoring of New Dash Protected Bookings performance
- Next:
- Start phase 2
- Would like to have started but other things on the table
### Stripe vs. Backend payments data discrepancies
- Work in progress
- But there is definitely an issue
- Having a hard time nailing because of other priorities
### Domain Analysts
- Ongoing:
- SQL training for Daga + Maha + Lisa
### A/B test - London Wallpaper & visual changes
- Continuous monitoring of current London Wallpaper A/B test
- Discussion with Guest Squad on new potential A/B test. Not comfortable with tracking or implementation plan, put in hold.
### Business as usual
- Done:
- Data alerts follow up and fixes
- Data requests handling

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,37 @@
# 2025-05-28 - Data Planning
### Off Topics
- Joaquin is back tomorrow
- Confident Stay launch is messy (it was expected to be launched on a Saturday, misalignment on New Dash/Old Dash release, etc)
- VOTC feedback
- Billable Bookings & Invoicing & Onboarding
### Confident Stay & Check In Hero
- Done
- Confident Stay available in KPIs
- Confident Stay dependants (Guest Revenue, Total Revenue, RRPR, etc) updated accordingly in KPIs
- In Progress
- Continue DWH modelling on Guest Products
- Ensure Check In Hero reporting makes it through the change
- Prepare confident stay reporting
### Domain Analysts
- SQL training for Daga + Maha + Lisa
- Access to DWH
### New Dash Reporting
- Improvements on New Dash Reporting Overview, including Check In date & Basic Screening Bookings
### A/B test Guest Journey
- A/B test London Wallpaper finished yesterday 27th May, results [here](https://www.notion.so/2025-05-27-Guest-Journey-London-Wallpaper-A-B-Test-Results-2000446ff9c9800d86f2d3bcfdbbec42?pvs=21)
- Mostly not significant results, likely positive CSAT - must re-do in the future again with more cities
- A/B test Your Trip Questionaire launched yesterday 27th May, details [here](https://www.notion.so/2025-Q2-2-Your-Trip-Questionaire-Guest-Journey-A-B-test-1f90446ff9c980a296b9ecb47cad21ef?pvs=21)
### Data-Driven Risk Assessment (DDRA)
- No news, didnt have time to work on this. Likely retaking it this week

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,40 @@
# 2025-06-04 - Data Planning
### Off Topics
- Monday 9th is day off in Barcelona
- On yesterdays Q3 meeting
- Billing doesnt get enough attention
- Still prioritising new initiatives without clear impact or effort estimation while theres clear things to fix that provide value
- Respect timings
- Democratic prioritisation does not work
### Confident Stay & Check In Hero
- Confident Stay Reporting is now live in Guest Insights
- All Check In Hero reports have moved to Guest Insights
- Ensure Check In Hero reporting makes it through the change
### Data-Driven Risk Assessment (DDRA)
- Setting everything up for experimentation
- First baseline (randomly flagging 1% of bookings) set up
- First experiment ongoing
### Domain Analysts
- SQL training for Daga + Maha + Lisa
- Access to DWH
### New Dash Reporting
- New Check In Bookings tab
### Accounting Reports
- Improvements ongoing to capture accounts with higher due amounts
- Budget tracker exploration
### A/B test Guest Journey
- Continuous monitoring, no relevant results yet

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

View file

@ -0,0 +1,36 @@
# 2025-06-11 - Data Planning
### Off Topics
- Screen & Protect API - Pricing/Invoicing
- Q3 Data objectives planning
- We survived the first semester chaos
- We want to organise again and have room for long-term scopes
- Data Insights TMT session
- Can we have 4 monthly meetings instead of 1 weekly?
### Confident Stay & Check In Hero
- Reporting done, being reviewed after launch
- Next step: Stripe process for Waivers
### Data-Driven Risk Assessment (DDRA)
- Internal Data team kick-off tomorrow, expecting to dedicate huge amount of work in the coming weeks
- Expecting having first insights by beg. July
### Domain Analysts
- SQL training for Daga + Maha + Lisa
### Accounting Reports
- Budget tracker waiting for Finance team
### Data quality improvements
- Removed all test accounts and their activity (bookings, etc) from all reports
### A/B test Guest Journey
- Continuous monitoring, so far it seems that new questions do not have a negative dramatic effect

View file

@ -0,0 +1,3 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=C:\Users\PabloMartín\Downloads\notion_data_team_no_files.zip

Some files were not shown because too many files have changed in this diff Show more