No description
Find a file
Oriol Roqué Paniagua 38f63afbf7 Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure
Connecting to DWH:
* Any existing notebook (AB, Flagging & Template) now have an initial simplified block to connect to the DWH. This is done to handle the DRY, as we're going to start adding more and more experiment notebooks very soon (and we have already 4 notebooks).
* This reads from a new `utils/dwh_utils.py` in which we handle the connection and test it accordingly.
* This also requires an optional `settings.json` path configuration to avoid warnings (not errors) when reading from `dwh_utils`.

Flagging:
* All flagging notebooks now go within the folder `data_driven_risk_assessment`. The already existing notebook `flagging_performance_monitoring` has also been moved here.
* There's a new `experiments` folder to store the different experiments on flagging.
* A new notebook has been added containing a straight-forward baseline: a random predictor, which randomly flags as risk bookings on a test set based on the observed booking claim rate on a previous train dataset.

I confirm that all existing notebooks work well after the connection changes.

Once merged, or to review, you will need to re-install requirements.txt as I added sklearn.

Related work items: #30804
2025-06-10 05:59:12 +00:00
.vscode Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure 2025-06-10 05:59:12 +00:00
data_driven_risk_assessment Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure 2025-06-10 05:59:12 +00:00
utils Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure 2025-06-10 05:59:12 +00:00
.gitignore First commit 2024-11-21 11:36:30 +01:00
ab_test_guest_journey_monitoring.ipynb Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure 2025-06-10 05:59:12 +00:00
credentials_example.yml First commit 2024-11-21 11:36:30 +01:00
README.md First commit 2024-11-21 11:36:30 +01:00
requirements.txt Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure 2025-06-10 05:59:12 +00:00
template.ipynb Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure 2025-06-10 05:59:12 +00:00

Introduction

Small repository to save and share Jupyter Notebooks within Data Team.

Getting Started

Basics

  • Pre-requisites
    • You need a Linux environment. That can be Linux, macOS or WSL.
    • You need to have Python >=3.10 installed.
    • All docs will assume you are using VSCode.
    • Also install the following VSCode Python extension: ms-python.python
  • Set up
    • Create a virtual environment for the project with python3 -m venv venv.
    • It's recommended that you set up the new venv as your default interpreter for VSCode. To do this, click Ctrl+Shift+P, and look for the Python: Select interpreter option. Choose the new venv.
    • Ensure that VS code is using this virtual environment. You can activate it by running source venv/bin/activate
    • Activate the virtual environment and run pip install -r requirements.txt
  • Lastly, you need to install the following extension to ensure VS code can render the notebooks. https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter

DWH connection

In order to connect to DWH, you will need to create a local file with the credentials. You can use the file credentials_example.yml. Remember to fill the user and password.

Once done, you need to save the credentials file in your local path: /home/{your_user}/.superhog-dwh/credentials.yml

Since this file has credentials, we need to secure it by ensuring that only your user has permissions. You need to run: chmod 600 /home/{your_user}/.superhog-dwh/credentials.yml

Once you've handled the previous steps, you can try to run the code in the template.ipynb file. If it works, then everything is successful. If not, check with someone in Data Team.