{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A/B test monitoring\n", "\n", "## Initial setup\n", "This first section just ensures that the connection to DWH works correctly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This script connects to a Data Warehouse (DWH) using PostgreSQL. \n", "# This should be common for all Notebooks, but you might need to adjust the path to the `dwh_utils` module.\n", "\n", "import sys\n", "import os\n", "sys.path.append(os.path.abspath(\"./utils\")) # Adjust path if needed\n", "\n", "from dwh_utils import read_credentials, create_postgres_engine, query_to_dataframe, test_connection\n", "\n", "# --- Connect to DWH ---\n", "creds = read_credentials()\n", "dwh_pg_engine = create_postgres_engine(creds)\n", "\n", "# --- Test Query ---\n", "test_connection()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from statsmodels.stats.proportion import proportions_ztest\n", "from scipy import stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A/B test configuration\n", "In this section we configure the parameters for the A/B test. Likely you do NOT need to change anything else than this, unless of course you want to create new metrics and so on.\n", "\n", "The parameters to be specified are:\n", "* **ab_test_name**: this should be the name of the feature flag corresponding to the A/B test. If you don't know the name, ask Guest Squad\n", "* **var_A** and **var_B**: these correspond to the name of the variants. At this moment, we can only handle univariant testing (though updating the code to include multivariant testing should not be extremely difficult). In general, choose var_A to be the Control group." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A/B test name to measure\n", "ab_test_name = 'YourTripQuestionaire'\n", "\n", "# Define the variations in which we want to run the tests\n", "var_A = 'DontShowYourTripQuestions' # Ideally, this should be the control group\n", "var_B = 'ShowYourTripQuestions' # Ideally, this should be the study group\n", "\n", "variations = [var_A, var_B]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Extraction\n", "In this section we extract the data from the Guest Journey monitoring within DWH by configuring which A/B test we want to measure. Here we already handle the basic aggregations that will be needed in the future, directly in SQL." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Query to extract data\n", "data_extraction_query = \"\"\"\n", "select \n", "\tgj.ab_test_name,\n", "\tgj.variation,\n", "\tmax(gj.first_appearance_date_utc) as last_update,\n", " \n", " -- SIMPLE COUNTS --\n", "\tcount(gj.id_verification_request) as guest_journeys_count,\n", "\tcount(gj.verification_started_date_utc) as guest_journey_started_count,\n", "\tcount(gj.verification_completed_date_utc) as guest_journey_completed_count,\n", "\tcount(gj.experience_rating) as guest_journey_with_responses_count,\n", "\tcount(gj.last_payment_paid_date_utc) as guest_journey_with_payment_count,\n", "\tcount(gj.guest_revenue_without_taxes_in_gbp) as guest_revenue_count,\n", "\tcount(gj.deposit_fees_without_taxes_in_gbp) as deposit_count,\n", "\tcount(gj.waiver_fees_without_taxes_in_gbp) as waiver_count,\n", "\tcount(gj.check_in_cover_fees_without_taxes_in_gbp) as check_in_cover_count,\n", " \n", " -- SIMPLE SUMS --\n", "\tsum(gj.guest_revenue_without_taxes_in_gbp) as guest_revenue_sum,\n", "\tsum(gj.deposit_fees_without_taxes_in_gbp) as deposit_sum,\n", "\tsum(gj.waiver_fees_without_taxes_in_gbp) as waiver_sum,\n", "\tsum(gj.check_in_cover_fees_without_taxes_in_gbp) as check_in_cover_sum,\n", " \n", " -- AVGs/SDVs PER GUEST JOURNEY (ANY GJ APPEARING IN THE A/B TEST) --\n", " -- NOTE THE COALESCE HERE. THIS IS IMPORTANT FOR THE T-TEST COMPUTATION --\n", " avg(coalesce(gj.guest_revenue_without_taxes_in_gbp,0)) as guest_revenue_avg_per_guest_journey,\n", " stddev(coalesce(gj.guest_revenue_without_taxes_in_gbp,0)) as guest_revenue_sdv_per_guest_journey,\n", " avg(coalesce(gj.deposit_fees_without_taxes_in_gbp,0)) as deposit_avg_per_guest_journey,\n", " stddev(coalesce(gj.deposit_fees_without_taxes_in_gbp,0)) as deposit_sdv_per_guest_journey,\n", " avg(coalesce(gj.waiver_fees_without_taxes_in_gbp,0)) as waiver_avg_per_guest_journey,\n", " stddev(coalesce(gj.waiver_fees_without_taxes_in_gbp,0)) as waiver_sdv_per_guest_journey,\n", " avg(coalesce(gj.check_in_cover_fees_without_taxes_in_gbp,0)) as check_in_cover_avg_per_guest_journey,\n", " stddev(coalesce(gj.check_in_cover_fees_without_taxes_in_gbp,0)) as check_in_cover_sdv_per_guest_journey,\n", " \n", " -- AVGs/SDVs PER GUEST JOURNEY WITH CSAT RESPONSE --\n", " -- NOTE THAT THERE'S NO COALESCE HERE. THIS IS IMPORTANT FOR THE T-TEST COMPUTATION --\n", " avg(gj.experience_rating) as csat_avg_per_guest_journey_with_response,\n", " stddev(gj.experience_rating) as csat_sdv_per_guest_journey_with_response\n", " \n", "from\n", "\tintermediate.int_core__ab_test_monitoring_guest_journey gj\n", "where\n", "\tgj.ab_test_name = '{}'\n", "group by\n", "\t1,2\n", "\"\"\".format(ab_test_name)\n", "\n", "# Retrieve Data from Query\n", "df = query_to_dataframe(engine=dwh_pg_engine, query=data_extraction_query)\n", "print(df.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check A/B test Allocation to Variation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Ensure Seaborn styling\n", "sns.set_theme(style=\"whitegrid\")\n", "\n", "# Calculate the total guest_journeys_count per variation\n", "grouped_data = df.groupby('variation')['guest_journeys_count'].sum()\n", "\n", "# Find the total count and other metadata\n", "total_count = grouped_data.sum()\n", "ab_test_name = df['ab_test_name'].iloc[0] # Assuming all rows are for the same A/B test\n", "last_update = df['last_update'].max()\n", "\n", "# Create a pie chart using Seaborn styling\n", "plt.figure(figsize=(8, 6))\n", "colors = sns.color_palette(\"pastel\") # Seaborn pastel colors\n", "\n", "# Pie chart with labels inside each sector\n", "plt.pie(\n", " grouped_data, \n", " labels=[f\"{var}\\n{count} ({count/total_count:.1%})\" for var, count in grouped_data.items()],\n", " autopct=None, \n", " colors=colors, \n", " startangle=90,\n", " wedgeprops={'edgecolor': 'none'}, # Remove edges around sectors\n", " pctdistance=0.70, # Places the labels closer to the center (inside)\n", " labeldistance=0.2 # Ensure labels are positioned inside the sectors\n", ")\n", "\n", "# Add title\n", "plt.title(\"Guest Journey - Variation Allocation\", fontsize=16)\n", "\n", "# Add total count to the bottom-left\n", "plt.text(-1.4, -1.3, f\"Total Count: {total_count}\", fontsize=10, ha='left', color='black')\n", "\n", "# Add A/B test name and last update to the bottom-right\n", "plt.text(1.2, -1.3, f\"A/B Test: {ab_test_name}\", fontsize=8, ha='right', color='gray')\n", "plt.text(1.2, -1.4, f\"Last Update: {last_update}\", fontsize=8, ha='right', color='gray')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistical Analysis\n", "In this section we compute the metrics needed for monitoring as well as check if there's any statistical difference between the different variations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Z-test for Proportion Metrics (Rates)\n", "This section defines the functions used to compute Z-test Proportion analysis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generalized function to calculate Z-test for any metric\n", "def calculate_z_test(df, metric_name, variation_a, variation_b, success_counts, total_counts):\n", "\n", " # Aggregate the success counts (numerator) and total counts (denominator) for each variation\n", " success_a = df[df['variation'] == variation_a][success_counts].sum()\n", " success_b = df[df['variation'] == variation_b][success_counts].sum()\n", "\n", " total_a = df[df['variation'] == variation_a][total_counts].sum()\n", " total_b = df[df['variation'] == variation_b][total_counts].sum()\n", "\n", " # Calculate conversion rates for each variation\n", " value_A = success_a / total_a if total_a != 0 else 0\n", " value_B = success_b / total_b if total_b != 0 else 0\n", "\n", " # Absolute difference (B - A)\n", " abs_diff = value_B - value_A\n", "\n", " # Relative difference (B - A) / A\n", " rel_diff = (value_B - value_A) / value_A if value_A != 0 else 0\n", "\n", " # Perform the z-test for proportions\n", " count = [success_a, success_b] # Success counts for A and B\n", " nobs = [total_a, total_b] # Total counts for A and B\n", " \n", " # Calculate z-stat and p-value\n", " z_stat, p_value = proportions_ztest(count, nobs)\n", " \n", " # Flag for significance at 95% level (p-value < 0.05)\n", " is_significant = p_value < 0.05\n", "\n", " # Return the result as a dictionary\n", " return {\n", " 'metric': metric_name,\n", " 'variation_A_name': variation_a,\n", " 'variation_B_name': variation_b,\n", " 'variation_A_value': value_A,\n", " 'variation_B_value': value_B,\n", " 'absolute_difference': abs_diff,\n", " 'relative_difference': rel_diff,\n", " 'statistic': z_stat,\n", " 'p_value': p_value,\n", " 'is_significant_95': is_significant\n", " }\n", "\n", "# Function to run Z-tests for multiple metrics and aggregate results into a DataFrame\n", "def run_z_tests(df, z_stat_metric_definition, variations):\n", " results = []\n", " \n", " # Loop over all metrics in z_stat_metric_definition\n", " for metric_name, metric_definition in z_stat_metric_definition.items():\n", " success_counts = metric_definition['success_counts']\n", " total_counts = metric_definition['total_counts']\n", " \n", " # Run the Z-test for each metric\n", " result = calculate_z_test(df, metric_name, variation_a=variations[0], variation_b=variations[1], \n", " success_counts=success_counts, total_counts=total_counts)\n", " results.append(result)\n", " \n", " # Create a DataFrame from the results\n", " results_df = pd.DataFrame(results)\n", " \n", " return results_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### T-test for non-proportion metrics\n", "This section defines the functions used to compute T-tests for metrics outside of the proportion scope, mostly Revenue-related metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# Generalized function to calculate T-test for revenue-related metrics\n", "def calculate_t_test(df, metric_name, variation_a, variation_b, metric_avg_column, metric_sdv_column, total_counts):\n", " # Aggregate the avgs and standard deviations for each variation\n", " mean_a = df[df['variation'] == variation_a][metric_avg_column].mean() # Assuming the avg is calculated for each group\n", " mean_b = df[df['variation'] == variation_b][metric_avg_column].mean() # Assuming the avg is calculated for each group\n", " \n", " sdv_a = df[df['variation'] == variation_a][metric_sdv_column].mean() # Assuming the stddev is calculated for each group\n", " sdv_b = df[df['variation'] == variation_b][metric_sdv_column].mean() # Assuming the stddev is calculated for each group\n", " \n", " total_a = df[df['variation'] == variation_a][total_counts].sum()\n", " total_b = df[df['variation'] == variation_b][total_counts].sum()\n", "\n", " # Absolute difference (B - A)\n", " abs_diff = mean_b - mean_a\n", "\n", " # Relative difference (B - A) / A\n", " rel_diff = (mean_b - mean_a) / mean_a if mean_a != 0 else 0\n", "\n", " # Calculate the T-statistic and p-value using the formula for two-sample T-test\n", " se_a = sdv_a / (total_a ** 0.5) if total_a != 0 else 0\n", " se_b = sdv_b / (total_b ** 0.5) if total_b != 0 else 0\n", "\n", " # Standard error of the difference between the means\n", " se_diff = (se_a ** 2 + se_b ** 2) ** 0.5\n", " \n", " # T-statistic formula\n", " if se_diff != 0:\n", " t_stat = (mean_a - mean_b) / se_diff\n", " else:\n", " t_stat = 0\n", " \n", " # Degrees of freedom (for independent samples)\n", " df_degrees = min(total_a - 1, total_b - 1) # Using the smaller of the two sample sizes minus 1\n", " \n", " # P-value from the T-distribution\n", " p_value = stats.t.sf(abs(t_stat), df_degrees) * 2 # Two-tailed test\n", " \n", " # Flag for significance at 95% level (p-value < 0.05)\n", " is_significant = p_value < 0.05\n", "\n", " # Return the result as a dictionary\n", " return {\n", " 'metric': metric_name,\n", " 'variation_A_name': variation_a,\n", " 'variation_B_name': variation_b,\n", " 'variation_A_value': mean_a,\n", " 'variation_B_value': mean_b,\n", " 'absolute_difference': abs_diff,\n", " 'relative_difference': rel_diff,\n", " 'statistic': t_stat,\n", " 'p_value': p_value,\n", " 'is_significant_95': is_significant\n", " }\n", "\n", "# Function to run T-tests for multiple revenue metrics and aggregate results into a DataFrame\n", "def run_t_tests(df, t_stat_metric_definition, variations):\n", " results = []\n", " \n", " # Loop over all metrics in t_stat_metric_definition\n", " for metric_name, metric_definition in t_stat_metric_definition.items():\n", " metric_avg_column = metric_definition['metric_avg_column']\n", " metric_sdv_column = metric_definition['metric_sdv_column']\n", " total_counts = metric_definition['total_counts']\n", " \n", " # Run the T-test for each metric\n", " result = calculate_t_test(df, metric_name, variation_a=variations[0], variation_b=variations[1], \n", " metric_avg_column=metric_avg_column, metric_sdv_column=metric_sdv_column, \n", " total_counts=total_counts)\n", " results.append(result)\n", " \n", " # Create a DataFrame from the results\n", " results_df = pd.DataFrame(results)\n", " \n", " return results_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify the metric definition for Z-stat and T-stat tests" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define the Z-test metric definitions (with both success_counts and total_counts)\n", "z_stat_metric_definition = {\n", " 'conversion_rate': {\n", " 'success_counts': 'guest_journey_completed_count',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'payment_rate': {\n", " 'success_counts': 'guest_journey_with_payment_count',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'waiver_payment_rate': {\n", " 'success_counts': 'waiver_count',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'deposit_payment_rate': {\n", " 'success_counts': 'deposit_count',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'CIH_payment_rate': {\n", " 'success_counts': 'check_in_cover_count',\n", " 'total_counts': 'guest_journeys_count'\n", " }\n", "}\n", "\n", "# Define the T-test metric definitions (with both metric_avg_column and metric_sdv_column)\n", "t_stat_metric_definition = {\n", " 'avg_guest_revenue_per_gj': {\n", " 'metric_avg_column': 'guest_revenue_avg_per_guest_journey',\n", " 'metric_sdv_column': 'guest_revenue_sdv_per_guest_journey',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'avg_waiver_revenue_per_gj': {\n", " 'metric_avg_column': 'waiver_avg_per_guest_journey',\n", " 'metric_sdv_column': 'waiver_sdv_per_guest_journey',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'avg_deposit_revenue_per_gj': {\n", " 'metric_avg_column': 'deposit_avg_per_guest_journey',\n", " 'metric_sdv_column': 'deposit_sdv_per_guest_journey',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'avg_CIH_revenue_per_gj': {\n", " 'metric_avg_column': 'check_in_cover_avg_per_guest_journey',\n", " 'metric_sdv_column': 'check_in_cover_sdv_per_guest_journey',\n", " 'total_counts': 'guest_journeys_count'\n", " },\n", " 'avg_csat_per_gj_with_response': {\n", " 'metric_avg_column': 'csat_avg_per_guest_journey_with_response',\n", " 'metric_sdv_column': 'csat_sdv_per_guest_journey_with_response',\n", " 'total_counts': 'guest_journey_with_responses_count'\n", " }\n", "\n", "}\n", "\n", "# Define the metrics that will be the main ones for this A/B test:\n", "main_metrics = ['avg_guest_revenue_per_gj', 'conversion_rate', 'payment_rate']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run the computation of the metrics and statistical significance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Call the function to calculate the Z-test for each metric and aggregate the results\n", "z_test_results_df = run_z_tests(df, z_stat_metric_definition=z_stat_metric_definition, variations=variations)\n", "\n", "# Call the function to calculate the T-test for each metric and aggregate the results\n", "t_test_results_df = run_t_tests(df, t_stat_metric_definition=t_stat_metric_definition, variations=variations)\n", "\n", "# Add a new column to identify whether it's from Z-test or T-test\n", "z_test_results_df['test_type'] = 'Z-test'\n", "t_test_results_df['test_type'] = 'T-test'\n", "\n", "# Combine the dataframes after adding the 'test_type' column\n", "combined_results_df = pd.concat([z_test_results_df, t_test_results_df], ignore_index=True)\n", "\n", "# Print the main aggregated DataFrame\n", "print(combined_results_df[['metric','relative_difference','p_value']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('\\n{} results (last updated at {})\\n'.format(ab_test_name, last_update))\n", "\n", "# Get main volume indicators per variation\n", "grouped_data = df.groupby('variation')[[\"guest_journeys_count\",\"guest_revenue_sum\"]].sum()\n", "\n", "# Find the totals over any variation\n", "total_count = grouped_data.sum()\n", "\n", "# Print overall indicators for volumes\n", "print('Total Guest Journeys affected by this A/B test: {} - Total Guest Revenue: {} GBP.'.format(int(total_count.loc[\"guest_journeys_count\"]), \n", " int(total_count.loc[\"guest_revenue_sum\"])))\n", "for var in variations:\n", " print(' Variation {}: Guest Journeys {} ({}%) - Guest Revenue: {} GBP ({}%).'.format(\n", " var, \n", " int(grouped_data.loc[var,'guest_journeys_count']), \n", " round(100*(grouped_data.loc[var,'guest_journeys_count']/total_count.loc[\"guest_journeys_count\"]),1),\n", " int(grouped_data.loc[var,'guest_revenue_sum']),\n", " round(100*(grouped_data.loc[var,'guest_revenue_sum']/total_count.loc[\"guest_revenue_sum\"]),1)\n", " ))\n", "\n", "# Split results whether the metrics are main metrics or not\n", "main_metrics_rows = combined_results_df[combined_results_df['metric'].isin(main_metrics)]\n", "other_metrics_rows = combined_results_df[~combined_results_df['metric'].isin(main_metrics)]\n", "\n", "def print_metrics(df, header=None):\n", " if header:\n", " print(f'\\n{header}\\n')\n", "\n", " for row in df.iterrows():\n", " metric = row[1]['metric'].upper().replace('_', ' ')\n", " if row[1]['test_type'] == 'Z-test':\n", " value_a = str(round(100 * row[1]['variation_A_value'], 1)) + '%'\n", " value_b = str(round(100 * row[1]['variation_B_value'], 1)) + '%'\n", " abs_diff = str(round(100 * row[1]['absolute_difference'], 1)) + '%'\n", " else:\n", " value_a = str(round(row[1]['variation_A_value'], 2))\n", " value_b = str(round(row[1]['variation_B_value'], 2))\n", " abs_diff = str(round(row[1]['absolute_difference'], 2))\n", " rel_diff = str(round(100 * row[1]['relative_difference'], 1)) + '%'\n", " stat_sign = row[1]['is_significant_95']\n", "\n", " if stat_sign:\n", " print(f\"{metric} - SIGNIFICANT RESULT: {value_b} vs. {value_a} ({abs_diff} ppts.| {rel_diff}).\")\n", " else:\n", " print(f\"{metric} (not significant): {value_b} vs. {value_a} ({abs_diff} ppts.| {rel_diff}).\")\n", "\n", "# Print main metrics\n", "print_metrics(main_metrics_rows, header=\"Main Metrics - Comparing {} vs. {}.\".format(var_B, var_A))\n", "\n", "# Print other metrics\n", "print_metrics(other_metrics_rows, header=\"Other Metrics\")\n" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 2 }