# 20241104-01 - Booking invoicing incident due to bulk UpdatedDate change # Booking invoicing incident due to bulk UpdatedDate change Managed by: Pablo ## Summary - Components involved: Superhog Backend SQL Server, DWH, PBI Reports, `sh-invoicing-exporter` tool - Started at: 2024-10-30 12:55:07.653 UTC - Detected at: first symptoms noticed around 2024-10-31 08:00:00 UTC, but severity was truly understood on 2024-11-04 10:57:00 UTC - Mitigated at: 2024-11-05 10:57:00 UTC A bulk backfill executed on the application SQL Server database to fix some not-relevant-to-this-incident column resulted in tens of thousands of `VerificationRequest` records having their `UpdatedDate` modified to when the backfill was executed. A poor assumption in the old dash invoicing logic was severely impacted and caused (1) the billable bookings metrics and reports to be showing utterly wrong data for 6 days and (2) a delay of ~27 hours in the delivery of the old dash invoicing exports to the finance team for the October ‘24 period. A backup of our SQL Server was restored and the incident-triggering changes were reverted in an emergency to unblock the generation of the invoicing exports. The root issue still exists and needs to be addressed. ## Impact - On the invoicing process: - The start of the old dash invoicing process, which is the generation of exports run by Pablo, should have started on 2024-11-04 08:00:00 UTC. Instead, it started on 2024-11-05 10:57:00 UTC, adding a delay of ~27 hours to all depending Finance team processes. - On reporting: - Since 2024-10-30 12:55:07.653 UTC and until 2024-11-05 10:20:00 UTC. - The DWH table `int_core__booking_charge_events` was displaying tens of thousands of billable bookings on 2024-10-30 that were wrong. This propagated through other DWH tables and finally to reports. contained a grossly wrong figure for October ‘24. - The Business Overview > Main KPIs report showed a grossly wrong count of invoiceable bookings for October (somewhere x2-3 orders of magnitude what the number should have been). - The Business Overview > Host Fees report show inflated numbers for the Superhog-inferred billable bookings count and booking fees revenue for October ‘24 (somewhere x2-3 orders of magnitude what the number should have been). - On the Guest Squad efforts, - Our mitigation solution of reverting the changes made on the `UpdatedDate` might have cause more troubles: the SQL Server is now *lying*, since the `PaymentValidationSetId` values of many records *were* updated on 2024-10-30 12:55:07 UTC, but the `UpdatedDate` values of those records are now effectively lying. I’m not aware of how this may cause further problems, but it could. ## Timeline All times are UTC. | Time | Event | | --- | --- | | 2024-10-30 12:55:07 | The bulk update script for the `PaymentValidationSetId` column on the table `VerificationRequest` gets executed, changing the `UpdatedDate` value of tens of thousands of records. | | 2024-10-31 06:28:00 | An automated outliers data alert gets triggered due to the wild variance in the Estimated Billable Bookings KPI. | | 2024-10-31 07:49:00 | Uri notices the issue (leaving a note in the data alerts chat) and correctly spots the fact that there is a spike of booking charge events on the date 2024-10-30. | | 2024-10-31 08:30:00 | The alert gets discussed during the Data Team’s daily call. Pablo wrongly judges that the `UpdatedDate` data shouldn’t cause an issue in invoicing and it’s just a minor KPI blip that can be fixed in the future, and the team decides that the alert is not urgent. | | 2024-11-04 08:30:00 | The Data team discusses again this topic in the daily call. The fact that it’s an invoicing exports day increases attention, and upon looking into some details, the team switches his mind and realises there might be serious implications for invoicing. | | 2024-11-04 09:30:00 | Ben C. inquires the Data team about the report in Business Overview > Host Fees > Booking Fees - Superhog is showing some wildly high numbers for October. | | 2024-11-04 10:51:00 | After some detailed research, Pablo realises that the invoicing imports are broken and starts the #invoicing-firefightning slack channel to gather stakeholders. | | 2024-11-04 11:09:00 | Pablo and Ben R. discuss about the issue and assess the option of switching the invoicing code to rely on `LinkUsed` instead of `UpdatedDate`. They agree on Pablo examining if that would do the trick. | | 2024-11-04 12:17:00 | Pablo concludes that the naive `LinkUsed` option won’t do the trick due to how data looks in `VerificationRequest`, and comes back to Ben R. to discuss how to proceed. They agree to, instead, restore the original values of the `UpdatedDate` columns in the records that were updated on 2024-10-30 12:55:07. | | 2024-11-04 12:23:00 | Ben R. starts restoring a database backup to restore the records. | | 2024-11-04 15:58:00 | Since the restore is taking longer than expected, Ben R. proposes running a simpler update by leveraging some fields in `VerificationRequest`, but Pablo points out that a partial solution won’t help the Finance team since running the exports multiple times would mean Finance’s manual work is only useful after the final export. | | 2024-11-05 08:00:00 | After a first failed restores, the second backup restore works on SQL Server. | | 2024-11-05 9:18:00 | Ben uses the restored data to revert the `UpdatedDate` changes in the records that were modified on 2024-10-30. | | 2024-11-05 10:20:00 | Pablo starts a backfill in Airbyte and a dbt run right after to propagate the new updated records throughout the DWH. | | 2024-11-05 10:54:00 | Pablo confirms that the large cluster of bookings attribute to October is not there anymore, as well as that the downstream reporting shows correct figures again. | | 2024-11-05 10:57:00 | Pablo triggers the export of the invoicing reports for the October period. | | 2024-11-05 14:53:00 | The exports finish successfully and Pablo shares them with Jamie D. | | | Incident mitigated. | ## Root Cause(s) The root cause is a combination of a: - A poorly-chosen assumption in the old dash invoicing logic (the usage of `UpdatedDate` field in the `VerificationRequest` table to decide in which month should a booking be charged for its booking fee when it is supposed to be charged on `VerificationStartDate`). - ([see these lines](https://guardhog.visualstudio.com/Data/_git/data-invoicing-exporter?path=/sh_invoicing/queries.py&version=GBmain&line=336&lineEnd=339&lineStartColumn=25&lineEndColumn=26&lineStyle=plain&_a=contents) in the latest release of `sh-invoicing-exporter`, which are the conceptual grand-childen of [these lines](https://guardhog.visualstudio.com/Superhog/_git/superhog-invoicing-console-app?path=/SuperhogInvoicing/SQLQueries.cs&version=GBmaster&line=159&lineEnd=165&lineStartColumn=3&lineEndColumn=6&lineStyle=plain&_a=contents) in the old C# sharp script, to understand the faulty assumption) - This is a conceptual problem that we still need to address if we want to prevent significant issues in future invoicing cycles. Our initial mitigation was treating symptoms, not the core issue. - An out of BAU bulk update in the `VerificationRequest` table in the backend SQL Server. I would like to make clear that the intent of this bulk update was perfectly legitimate and its execution was also proper. Even though it’s side effects have been troublesome, the update itself was not an issue nor a mistake. So, the true issue is the troublesome reliance of the invoicing code on `UpdatedDate` , which is always a tiny issue, but turns into a massive one anytime any tech squad in Superhog performs an update on the `VerificationRequest` table that goes beyond the usual activity of the application. Given that this out of the usual operations will keep on happening in the future, it is important that we address the true issue to avoid more incidents like this one in the future. ## Resolution and recovery The problem was mitigated by reverting the changes made in the `UpdatedDate` through the restoring of a backup of the SQL Server database and some adhoc script being run on the production database. This allowed us to bring reporting back to normal and continue the invoicing exports, at the expense of leaving the SQL Server database in an inconsistent state. The true solution to the problem is still unaddressed (see the root cause section). ## **Lessons Learned** *List of knowledge acquired. Typically structured as: What went well, what went badly, where did we get lucky* - What went well - Automated KPI outlier tests from the Data team brought the spike of billable bookings into Uri’s attention. - The production backups of the SQL Server database allowed us to restore the original `UpdatedDate` values, providing us a fast way to unblock the invoicing process. - What went badly - Even though we got an early alert, Pablo wrongly triaged the unusually high number of billable bookings as a minor issue that wouldn’t impact the invoicing process. - The faulty logic/assumption to place the invoicing on bookings in time has been sitting around for years, undocumented. We don’t know have any trace of why we built it this way in the first place. - The faulty logic/assumption to place the invoicing on bookings in time might have been leading to wrongly placement-in-time of bookings fees for a long time, but the high complexity of the logic and the way the history of records is managed in the database make it very hard to understand what is the true extend of the issue. - Obtaining a restore of the production database can take multiple hours. - We screwed up with the consistency of data in `VerificationRequest`. Tens of thousands of `UpdatedDate` values in the `VerificationRequest` table are now wrong. ## Action Items - [ ] Identify all business logic which is now relying on the `UpdatedDate` field of the `VerificationRequest` table in the SQL Server table. - [ ] Once the above logic is catalogued, apply changes and fixes so that `UpdatedDate` can be modified without causing incidents. - [ ] Potentially, extend the exercise beyond `VerificationRequest`, since the same problem pattern could apply to all sorts of update-able tables in the SQL Server database. ## Appendix *Miscellanea corner for anything else you might want to include* - Link to first notes when we started tackling the issue: [20241101 - Invoicing UpdateDate mess up](https://www.notion.so/20241101-Invoicing-UpdateDate-mess-up-1340446ff9c980b2926fc6284572f740?pvs=21) - Code for the bulk update script executed on 2024-10-30 ```sql DECLARE @CurrentDate AS DATETIME = GETDATE() SELECT [pvs].[VerificationRequestId] , [vr].[GuestJourneyCompletedDate] , [vr].[ExpiryDate] , [vr].[PaymentValidationSetId] , [pvs].[PaymentValidationSetId] FROM ( SELECT [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId] FROM ( SELECT [vr].[Id] AS [VerificationRequestId] , COALESCE( [vr].[OverridePaymentValidationSetId], [a].[PaymentValidationSetId], [pvs_a].[Id], [pvs_d].[Id] ) AS [PaymentValidationSetId] FROM [dbo].[VerificationRequest] [vr] -- Listing Override LEFT JOIN [dbo].[Booking] [b] ON [b].[VerificationRequestId] = [vr].[Id] LEFT JOIN [dbo].[Accommodation] [a] ON [a].[AccommodationId] = [b].[AccommodationId] -- Account Override LEFT JOIN [dbo].[PaymentValidationSet] [pvs_a] ON [pvs_a].[SuperhogUserId] = [vr].[CreatedByUserId] AND [pvs_a].[IsCustom] = 0 AND [pvs_a].[IsActive] = 1 -- Default LEFT JOIN [dbo].[PaymentValidationSet] [pvs_d] ON [pvs_d].[SuperhogUserId] IS NULL AND [pvs_d].[IsCustom] = 0 AND [pvs_d].[IsActive] = 1 ) [pvs] GROUP BY [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId] ) [pvs] LEFT JOIN [dbo].[VerificationRequest] [vr] ON [vr].[Id] = [pvs].[VerificationRequestId] LEFT JOIN [dbo].[user] [u] ON [u].[Id] = [vr].[SuperhogUserId] LEFT JOIN [dbo].[Country] [co] ON [co].[Id] = [u].[BillingCountryId] LEFT JOIN [dbo].[Currency] [cu] ON [cu].[Id] = [co].[PreferredCurrencyId] LEFT JOIN [dbo].[PaymentValidationSetToCurrency] [pvstc] ON [pvstc].[PaymentValidationSetId] = [pvs].[PaymentValidationSetId] AND [pvstc].[CurrencyIso] = [cu].[IsoCode] --WHERE [VerificationRequestId] = 913616 WHERE [vr].[GuestJourneyCompletedDate] IS NULL and [vr].[PaymentValidationSetId] IS NULL and [vr].[ExpiryDate] >= GETDATE() ---and [VerificationRequestId] = 913616 BEGIN TRAN UPDATE [vr] SET [PaymentValidationSetId] = [pvs].[PaymentValidationSetId] , [UpdatedDate] = @CurrentDate FROM ( SELECT [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId] FROM ( SELECT [vr].[Id] AS [VerificationRequestId] , COALESCE( [vr].[OverridePaymentValidationSetId], [a].[PaymentValidationSetId], [pvs_a].[Id], [pvs_d].[Id] ) AS [PaymentValidationSetId] FROM [dbo].[VerificationRequest] [vr] -- Listing Override LEFT JOIN [dbo].[Booking] [b] ON [b].[VerificationRequestId] = [vr].[Id] LEFT JOIN [dbo].[Accommodation] [a] ON [a].[AccommodationId] = [b].[AccommodationId] -- Account Override LEFT JOIN [dbo].[PaymentValidationSet] [pvs_a] ON [pvs_a].[SuperhogUserId] = [vr].[CreatedByUserId] AND [pvs_a].[IsCustom] = 0 AND [pvs_a].[IsActive] = 1 -- Default LEFT JOIN [dbo].[PaymentValidationSet] [pvs_d] ON [pvs_d].[SuperhogUserId] IS NULL AND [pvs_d].[IsCustom] = 0 AND [pvs_d].[IsActive] = 1 ) [pvs] GROUP BY [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId] ) [pvs] LEFT JOIN [dbo].[VerificationRequest] [vr] ON [vr].[Id] = [pvs].[VerificationRequestId] LEFT JOIN [dbo].[user] [u] ON [u].[Id] = [vr].[SuperhogUserId] LEFT JOIN [dbo].[Country] [co] ON [co].[Id] = [u].[BillingCountryId] LEFT JOIN [dbo].[Currency] [cu] ON [cu].[Id] = [co].[PreferredCurrencyId] LEFT JOIN [dbo].[PaymentValidationSetToCurrency] [pvstc] ON [pvstc].[PaymentValidationSetId] = [pvs].[PaymentValidationSetId] AND [pvstc].[CurrencyIso] = [cu].[IsoCode] WHERE [vr].[GuestJourneyCompletedDate] IS NULL and [vr].[PaymentValidationSetId] IS NULL and [vr].[ExpiryDate] >= GETDATE() --and [VerificationRequestId] = 913616 SELECT [pvs].[VerificationRequestId] , [vr].[GuestJourneyCompletedDate] , [vr].[ExpiryDate] , [vr].[PaymentValidationSetId] , [pvs].[PaymentValidationSetId] FROM ( SELECT [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId] FROM ( SELECT [vr].[Id] AS [VerificationRequestId] , COALESCE( [vr].[OverridePaymentValidationSetId], [a].[PaymentValidationSetId], [pvs_a].[Id], [pvs_d].[Id] ) AS [PaymentValidationSetId] FROM [dbo].[VerificationRequest] [vr] -- Listing Override LEFT JOIN [dbo].[Booking] [b] ON [b].[VerificationRequestId] = [vr].[Id] LEFT JOIN [dbo].[Accommodation] [a] ON [a].[AccommodationId] = [b].[AccommodationId] -- Account Override LEFT JOIN [dbo].[PaymentValidationSet] [pvs_a] ON [pvs_a].[SuperhogUserId] = [vr].[CreatedByUserId] AND [pvs_a].[IsCustom] = 0 AND [pvs_a].[IsActive] = 1 -- Default LEFT JOIN [dbo].[PaymentValidationSet] [pvs_d] ON [pvs_d].[SuperhogUserId] IS NULL AND [pvs_d].[IsCustom] = 0 AND [pvs_d].[IsActive] = 1 ) [pvs] GROUP BY [pvs].[VerificationRequestId], [pvs].[PaymentValidationSetId] ) [pvs] LEFT JOIN [dbo].[VerificationRequest] [vr] ON [vr].[Id] = [pvs].[VerificationRequestId] LEFT JOIN [dbo].[user] [u] ON [u].[Id] = [vr].[SuperhogUserId] LEFT JOIN [dbo].[Country] [co] ON [co].[Id] = [u].[BillingCountryId] LEFT JOIN [dbo].[Currency] [cu] ON [cu].[Id] = [co].[PreferredCurrencyId] LEFT JOIN [dbo].[PaymentValidationSetToCurrency] [pvstc] ON [pvstc].[PaymentValidationSetId] = [pvs].[PaymentValidationSetId] AND [pvstc].[CurrencyIso] = [cu].[IsoCode] --WHERE [VerificationRequestId] = 913616 WHERE [vr].[GuestJourneyCompletedDate] IS NULL and [vr].[PaymentValidationSetId] IS NULL and [vr].[ExpiryDate] >= GETDATE() --and [VerificationRequestId] = 913616 ROLLBACK TRAN --COMMIT TRAN ```