No description

Find a file

Oriol Roqué Paniagua 38f63afbf7 Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure Connecting to DWH: * Any existing notebook (AB, Flagging & Template) now have an initial simplified block to connect to the DWH. This is done to handle the DRY, as we're going to start adding more and more experiment notebooks very soon (and we have already 4 notebooks). * This reads from a new `utils/dwh_utils.py` in which we handle the connection and test it accordingly. * This also requires an optional `settings.json` path configuration to avoid warnings (not errors) when reading from `dwh_utils`. Flagging: * All flagging notebooks now go within the folder `data_driven_risk_assessment`. The already existing notebook `flagging_performance_monitoring` has also been moved here. * There's a new `experiments` folder to store the different experiments on flagging. * A new notebook has been added containing a straight-forward baseline: a random predictor, which randomly flags as risk bookings on a test set based on the observed booking claim rate on a previous train dataset. I confirm that all existing notebooks work well after the connection changes. Once merged, or to review, you will need to re-install requirements.txt as I added sklearn. Related work items: #30804		2025-06-10 05:59:12 +00:00
.vscode	Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure	2025-06-10 05:59:12 +00:00
data_driven_risk_assessment	Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure	2025-06-10 05:59:12 +00:00
utils	Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure	2025-06-10 05:59:12 +00:00
.gitignore	First commit	2024-11-21 11:36:30 +01:00
ab_test_guest_journey_monitoring.ipynb	Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure	2025-06-10 05:59:12 +00:00
credentials_example.yml	First commit	2024-11-21 11:36:30 +01:00
README.md	First commit	2024-11-21 11:36:30 +01:00
requirements.txt	Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure	2025-06-10 05:59:12 +00:00
template.ipynb	Merged PR 5371: Flagging w. random predictor + DWH connection improvements + restructure	2025-06-10 05:59:12 +00:00

README.md

Introduction

Small repository to save and share Jupyter Notebooks within Data Team.

Getting Started

Basics

Pre-requisites
- You need a Linux environment. That can be Linux, macOS or WSL.
- You need to have Python >=3.10 installed.
- All docs will assume you are using VSCode.
- Also install the following VSCode Python extension: ms-python.python
Set up
- Create a virtual environment for the project with python3 -m venv venv.
- It's recommended that you set up the new venv as your default interpreter for VSCode. To do this, click Ctrl+Shift+P, and look for the Python: Select interpreter option. Choose the new venv.
- Ensure that VS code is using this virtual environment. You can activate it by running source venv/bin/activate
- Activate the virtual environment and run pip install -r requirements.txt
Lastly, you need to install the following extension to ensure VS code can render the notebooks. https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter

DWH connection

In order to connect to DWH, you will need to create a local file with the credentials. You can use the file credentials_example.yml. Remember to fill the user and password.

Once done, you need to save the credentials file in your local path: /home/{your_user}/.superhog-dwh/credentials.yml

Since this file has credentials, we need to secure it by ensuring that only your user has permissions. You need to run: chmod 600 /home/{your_user}/.superhog-dwh/credentials.yml

Once you've handled the previous steps, you can try to run the code in the template.ipynb file. If it works, then everything is successful. If not, check with someone in Data Team.