2320 lines
1.7 MiB
Text
2320 lines
1.7 MiB
Text
|
|
{
|
|||
|
|
"cells": [
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "84dcd475",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"# DDRA - Contactless (Full)\n",
|
|||
|
|
"\n",
|
|||
|
|
"## General Idea\n",
|
|||
|
|
"The idea is to play only with numeric features (floats, integers or booleans) that are CONTACTLESS.\n",
|
|||
|
|
"\n",
|
|||
|
|
"This considers the FULL set of features.\n",
|
|||
|
|
"\n",
|
|||
|
|
"A more readable EDA is available in Notion here: [EDA Uri: Contactless](https://www.notion.so/truvi/EDA-Uri-Contactless-2170446ff9c980909624d45a6c124ec2)\n",
|
|||
|
|
"\n",
|
|||
|
|
"## Initial setup\n",
|
|||
|
|
"This first section just ensures that the connection to DWH works correctly."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 22,
|
|||
|
|
"id": "12368ce1",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"🔌 Testing connection using credentials at: /home/uri/.superhog-dwh/credentials.yml\n",
|
|||
|
|
"✅ Connection successful.\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# This script connects to a Data Warehouse (DWH) using PostgreSQL. \n",
|
|||
|
|
"# This should be common for all Notebooks, but you might need to adjust the path to the `dwh_utils` module.\n",
|
|||
|
|
"\n",
|
|||
|
|
"import sys\n",
|
|||
|
|
"import os\n",
|
|||
|
|
"sys.path.append(os.path.abspath(\"../../utils\")) # Adjust path if needed\n",
|
|||
|
|
"\n",
|
|||
|
|
"from dwh_utils import read_credentials, create_postgres_engine, query_to_dataframe, test_connection\n",
|
|||
|
|
"\n",
|
|||
|
|
"# --- Connect to DWH ---\n",
|
|||
|
|
"creds = read_credentials()\n",
|
|||
|
|
"dwh_pg_engine = create_postgres_engine(creds)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# --- Test Query ---\n",
|
|||
|
|
"test_connection()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "c86f94f1",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Data Extraction\n",
|
|||
|
|
"In this section we extract the data.\n",
|
|||
|
|
"\n",
|
|||
|
|
"This SQL query retrieves a clean and relevant subset of booking data for our model. It includes:\n",
|
|||
|
|
"- A **unique booking ID**\n",
|
|||
|
|
"- Key **numeric features** such as number of services, time between booking creation and check-in, number of nights, etc.\n",
|
|||
|
|
"- Several **categorical (boolean) features** related to service usage\n",
|
|||
|
|
"- A **target variable** (`has_resolution_incident`) indicating whether a resolution incident occurred\n",
|
|||
|
|
"\n",
|
|||
|
|
"Filters applied being:\n",
|
|||
|
|
"1. Bookings from **\"New Dash\" users** with a valid deal ID\n",
|
|||
|
|
"2. Only **protected bookings**, i.e., those with Protection or Deposit Management services\n",
|
|||
|
|
"3. Bookings flagged for **risk categorisation** (excluding incomplete/rejected ones)\n",
|
|||
|
|
"4. Bookings that are **already completed**\n",
|
|||
|
|
"\n",
|
|||
|
|
"The result is converted into a pandas DataFrame for further processing and modeling.\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 23,
|
|||
|
|
"id": "3e3ed391",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"# Initialise all imports needed for the Notebook\n",
|
|||
|
|
"from sklearn.model_selection import (\n",
|
|||
|
|
" train_test_split, \n",
|
|||
|
|
" GridSearchCV\n",
|
|||
|
|
")\n",
|
|||
|
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|||
|
|
"from sklearn.pipeline import Pipeline\n",
|
|||
|
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
|
"import pandas as pd\n",
|
|||
|
|
"import numpy as np\n",
|
|||
|
|
"from datetime import date\n",
|
|||
|
|
"from sklearn.metrics import (\n",
|
|||
|
|
" roc_auc_score, \n",
|
|||
|
|
" average_precision_score,\n",
|
|||
|
|
" classification_report,\n",
|
|||
|
|
" roc_curve, \n",
|
|||
|
|
" auc,\n",
|
|||
|
|
" precision_recall_curve,\n",
|
|||
|
|
" precision_score,\n",
|
|||
|
|
" recall_score,\n",
|
|||
|
|
" fbeta_score,\n",
|
|||
|
|
" confusion_matrix\n",
|
|||
|
|
")\n",
|
|||
|
|
"import matplotlib.pyplot as plt\n",
|
|||
|
|
"import shap\n",
|
|||
|
|
"import seaborn as sns"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 24,
|
|||
|
|
"id": "db5e3098",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Total Bookings: 21,384\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Query to extract data\n",
|
|||
|
|
"data_extraction_query = \"\"\"\n",
|
|||
|
|
"WITH \n",
|
|||
|
|
"service_information AS (\n",
|
|||
|
|
"\tSELECT\n",
|
|||
|
|
"\t\tid_booking,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_business_type = 'SCREENING' THEN id_booking_service_detail ELSE NULL END) AS number_of_applied_screening_services,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_business_type = 'DEPOSIT_MANAGEMENT' THEN id_booking_service_detail ELSE NULL END) AS number_of_applied_deposit_management_services,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_business_type = 'PROTECTION' THEN id_booking_service_detail ELSE NULL END) AS number_of_applied_protection_services,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'WAIVER PRO' THEN id_booking ELSE NULL END)>0 AS has_waiver_pro,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name IN ('BASIC DAMAGE DEPOSIT','BASIC DAMAGE DEPOSIT OR BASIC WAIVER','BASIC DAMAGE DEPOSIT OR WAIVER PLUS','BASIC WAIVER','WAIVER PLUS') THEN id_booking ELSE NULL END)>0 AS has_guest_facing_waiver_or_deposit,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'GUEST AGREEMENT' THEN id_booking ELSE NULL END)>0 AS has_guest_agreement,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'BASIC PROTECTION' THEN id_booking ELSE NULL END)>0 AS has_basic_protection,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'PROTECTION PLUS' THEN id_booking ELSE NULL END)>0 AS has_protection_plus,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'PROTECTION PRO' THEN id_booking ELSE NULL END)>0 AS has_protection_pro,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'ID VERIFICATION' THEN id_booking ELSE NULL END)>0 AS has_id_verification,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'SCREENING PLUS' THEN id_booking ELSE NULL END)>0 AS has_screening_plus,\n",
|
|||
|
|
"\t\tcount(DISTINCT CASE WHEN service_name = 'SEX OFFENDER CHECK' THEN id_booking ELSE NULL END)>0 AS has_sex_offender_check\n",
|
|||
|
|
"\tFROM\n",
|
|||
|
|
"\t\tintermediate.int_core__booking_service_detail\n",
|
|||
|
|
"\tGROUP BY\n",
|
|||
|
|
"\t\t1\n",
|
|||
|
|
"),\n",
|
|||
|
|
"listing_information AS (\n",
|
|||
|
|
"SELECT \n",
|
|||
|
|
"\tica.id_accommodation,\n",
|
|||
|
|
"\t-- Defaults to 0 if null\n",
|
|||
|
|
"\tCOALESCE(ica.number_of_bedrooms, 0) AS listing_number_of_bedrooms,\n",
|
|||
|
|
"\t-- Defaults to 0 if null\n",
|
|||
|
|
"\tCOALESCE(ica.number_of_bathrooms, 0) AS listing_number_of_bathrooms\n",
|
|||
|
|
"\tFROM intermediate.int_core__accommodation ica \n",
|
|||
|
|
"),\n",
|
|||
|
|
"raw_bookings_checked_in_prior_to_TCR AS (\n",
|
|||
|
|
"\tSELECT\n",
|
|||
|
|
"\t\tb.id_booking,\n",
|
|||
|
|
"\t\t-- Using group by on check-in date to remove booking duplicates\n",
|
|||
|
|
"\t\tb2.booking_check_in_date_utc,\n",
|
|||
|
|
"\t\t-- Using min as a conservative approach to reduce outliers\n",
|
|||
|
|
"\t\tmin(b2.booking_number_of_nights) AS min_booking_number_of_nights\n",
|
|||
|
|
"\tFROM\n",
|
|||
|
|
"\t\tintermediate.int_booking_summary b\n",
|
|||
|
|
"\t-- Note that by joining with BS we're only considering New Dash bookings\n",
|
|||
|
|
"\tLEFT JOIN intermediate.int_booking_summary b2\n",
|
|||
|
|
" ON\n",
|
|||
|
|
"\t\tb2.id_accommodation = b.id_accommodation\n",
|
|||
|
|
"\t\t-- Exclusion based on actual booking creation!\n",
|
|||
|
|
"\t\tAND b2.booking_check_in_date_utc >= b.booking_created_date_utc - INTERVAL '30 days'\n",
|
|||
|
|
"\t\tAND b2.booking_check_in_date_utc < b.booking_created_date_utc\n",
|
|||
|
|
"\t\t-- Note that since is based on TCR we can remove Cancelled\n",
|
|||
|
|
"\t\tAND b2.booking_status NOT IN ('CANCELLED')\n",
|
|||
|
|
"\tGROUP BY\n",
|
|||
|
|
"\t\tb.id_booking,\n",
|
|||
|
|
"\t\tb2.booking_check_in_date_utc\n",
|
|||
|
|
"),\n",
|
|||
|
|
"bookings_checked_in_prior_to_TCR AS (\n",
|
|||
|
|
"\tSELECT\n",
|
|||
|
|
"\t\tid_booking,\n",
|
|||
|
|
"\t\tLEAST(\n",
|
|||
|
|
"\t\t\tcount(booking_check_in_date_utc),\n",
|
|||
|
|
"\t\t\t30\n",
|
|||
|
|
"\t\t) AS listing_check_ins_prior_to_TCR_in_30_days,\n",
|
|||
|
|
"\t\t-- Capping\n",
|
|||
|
|
"\t\tLEAST(\n",
|
|||
|
|
"\t\t\tGREATEST(\n",
|
|||
|
|
"\t\t\t\tsum(min_booking_number_of_nights),\n",
|
|||
|
|
"\t\t\t\t0\n",
|
|||
|
|
"\t\t\t),\n",
|
|||
|
|
"\t\t\t30\n",
|
|||
|
|
"\t\t) AS listing_occupancy_prior_to_TCR_in_30_days\n",
|
|||
|
|
"\tFROM\n",
|
|||
|
|
"\t\traw_bookings_checked_in_prior_to_TCR\n",
|
|||
|
|
"\tGROUP BY\n",
|
|||
|
|
"\t\t1\n",
|
|||
|
|
"),\n",
|
|||
|
|
"raw_known_bookings_checking_in_prior_to_TCI AS (\n",
|
|||
|
|
"\tSELECT\n",
|
|||
|
|
"\t\tb.id_booking,\n",
|
|||
|
|
"\t\tb.booking_check_in_date_utc,\n",
|
|||
|
|
"\t\t-- Using group by on check-in date to remove booking duplicates\n",
|
|||
|
|
"\t\tb2.booking_check_in_date_utc AS other_bookings_check_in_date_utc,\n",
|
|||
|
|
"\t\t-- Using min as a conservative approach to reduce outliers\n",
|
|||
|
|
"\t\tmin(b2.booking_number_of_nights) AS min_booking_number_of_nights\n",
|
|||
|
|
"\tFROM\n",
|
|||
|
|
"\t\tintermediate.int_booking_summary b\n",
|
|||
|
|
"\t-- Note that by joining with BS we're only considering New Dash bookings\n",
|
|||
|
|
"\tLEFT JOIN intermediate.int_booking_summary b2\n",
|
|||
|
|
" ON\n",
|
|||
|
|
"\t\tb2.id_accommodation = b.id_accommodation\n",
|
|||
|
|
"\t\t-- Exclusion based on check-in\n",
|
|||
|
|
"\t\tAND b2.booking_check_in_date_utc >= b.booking_check_in_date_utc - INTERVAL '30 days'\n",
|
|||
|
|
"\t\tAND b2.booking_check_in_date_utc < b.booking_check_in_date_utc\n",
|
|||
|
|
"\t\t-- that are known!\n",
|
|||
|
|
"\t\tAND b2.booking_created_date_utc < b.booking_created_date_utc\n",
|
|||
|
|
"\t\t-- Note that since is based on TCI we cannot remove Cancelled\n",
|
|||
|
|
"\tGROUP BY\n",
|
|||
|
|
"\t\tb.id_booking,\n",
|
|||
|
|
"\t\tb.booking_check_in_date_utc,\n",
|
|||
|
|
"\t\tb2.booking_check_in_date_utc\n",
|
|||
|
|
"),\n",
|
|||
|
|
"known_bookings_checking_in_prior_to_TCI AS (\n",
|
|||
|
|
"\tSELECT\n",
|
|||
|
|
"\t\tid_booking,\n",
|
|||
|
|
"\t\tLEAST(\n",
|
|||
|
|
"\t\t\tcount(other_bookings_check_in_date_utc),\n",
|
|||
|
|
"\t\t\t30\n",
|
|||
|
|
"\t\t) AS listing_known_check_ins_prior_to_TCI_in_30_days,\n",
|
|||
|
|
"\t\t-- Capping\n",
|
|||
|
|
"\t\tLEAST(\n",
|
|||
|
|
"\t\t\tGREATEST(\n",
|
|||
|
|
"\t\t\t\tsum(min_booking_number_of_nights),\n",
|
|||
|
|
"\t\t\t\t0\n",
|
|||
|
|
"\t\t\t),\n",
|
|||
|
|
"\t\t\t30\n",
|
|||
|
|
"\t\t) AS listing_known_occupancy_prior_to_TCI_in_30_days,\n",
|
|||
|
|
"\t\tCOALESCE(\n",
|
|||
|
|
"\t\t\tbooking_check_in_date_utc - max(other_bookings_check_in_date_utc),\n",
|
|||
|
|
"\t\t\t30\n",
|
|||
|
|
"\t\t) AS lead_time_between_prior_known_check_in_to_TCI_30_days\n",
|
|||
|
|
"\tFROM\n",
|
|||
|
|
"\t\traw_known_bookings_checking_in_prior_to_TCI\n",
|
|||
|
|
"\tGROUP BY\n",
|
|||
|
|
"\t\tid_booking, \n",
|
|||
|
|
"\t\tbooking_check_in_date_utc\n",
|
|||
|
|
"),\n",
|
|||
|
|
"incidents_prior_to_TCP AS (\n",
|
|||
|
|
"\tSELECT\n",
|
|||
|
|
"\t\tb.id_booking,\n",
|
|||
|
|
"\t\t-- Using distinct count on check-in date to remove booking duplicates\n",
|
|||
|
|
"\t\tCOUNT(DISTINCT b2.booking_check_in_date_utc) AS listing_incidents_prior_to_TCP_in_30_days\n",
|
|||
|
|
"\tFROM\n",
|
|||
|
|
"\t\tintermediate.int_booking_summary b\n",
|
|||
|
|
"\tLEFT JOIN intermediate.int_booking_summary b2\n",
|
|||
|
|
" ON\n",
|
|||
|
|
"\t\tb2.id_accommodation = b.id_accommodation\n",
|
|||
|
|
"\t\t-- Filter on Check Out date\n",
|
|||
|
|
"\t\tAND b2.booking_completed_date_utc >= b.booking_created_date_utc - INTERVAL '30 days'\n",
|
|||
|
|
"\t\tAND b2.booking_completed_date_utc < b.booking_created_date_utc\n",
|
|||
|
|
"\t\tAND b2.has_resolution_incident = TRUE\n",
|
|||
|
|
"\tGROUP BY\n",
|
|||
|
|
"\t\tb.id_booking\n",
|
|||
|
|
")\n",
|
|||
|
|
"SELECT\n",
|
|||
|
|
"\t-- UNIQUE BOOKING ID --\n",
|
|||
|
|
"\tbooking_summary.id_booking,\n",
|
|||
|
|
"\t\n",
|
|||
|
|
"\t-- CONTEXTUAL SERVICE INFORMATION --\n",
|
|||
|
|
"\t-- We're not including number_of_applied_services as it 1-correlates with upgraded services\n",
|
|||
|
|
"\tbooking_summary.number_of_applied_upgraded_services,\n",
|
|||
|
|
"\tbooking_summary.number_of_applied_billable_services,\n",
|
|||
|
|
"\tservice_information.number_of_applied_screening_services,\n",
|
|||
|
|
"\tservice_information.number_of_applied_deposit_management_services,\n",
|
|||
|
|
"\tservice_information.number_of_applied_protection_services,\n",
|
|||
|
|
"\tservice_information.has_waiver_pro,\n",
|
|||
|
|
"\tservice_information.has_guest_facing_waiver_or_deposit,\n",
|
|||
|
|
"\tservice_information.has_guest_agreement,\n",
|
|||
|
|
"\tservice_information.has_basic_protection,\n",
|
|||
|
|
"\tservice_information.has_protection_plus,\n",
|
|||
|
|
"\tservice_information.has_protection_pro,\n",
|
|||
|
|
"\tservice_information.has_id_verification,\n",
|
|||
|
|
"\tservice_information.has_screening_plus,\n",
|
|||
|
|
"\tservice_information.has_sex_offender_check,\n",
|
|||
|
|
"\tNOT booking_summary.has_verification_request AS is_contactless_booking,\n",
|
|||
|
|
"\t\n",
|
|||
|
|
"\t-- CONTEXTUAL LISTING INFORMATION --\n",
|
|||
|
|
"\tlisting_information.listing_number_of_bedrooms,\n",
|
|||
|
|
"\tlisting_information.listing_number_of_bathrooms,\n",
|
|||
|
|
"\t\n",
|
|||
|
|
"\t-- CONTEXTUAL TIMELINE OF OUR BOOKING\n",
|
|||
|
|
"\t-- Defaults to 0 if booking_created_date_utc > booking_check_in_date_utc\n",
|
|||
|
|
"\tGREATEST(booking_summary.booking_check_in_date_utc - booking_summary.booking_created_date_utc, 0) AS booking_lead_time,\n",
|
|||
|
|
"\tbooking_summary.booking_check_out_date_utc - booking_summary.booking_check_in_date_utc AS booking_duration,\n",
|
|||
|
|
"\t\n",
|
|||
|
|
"\t-- SAME-LISTING, OTHER BOOKING INTERACTIONS: PRIOR TO TCR\n",
|
|||
|
|
"\tbookings_checked_in_prior_to_TCR.listing_check_ins_prior_to_TCR_in_30_days,\n",
|
|||
|
|
"\tbookings_checked_in_prior_to_TCR.listing_occupancy_prior_to_TCR_in_30_days,\n",
|
|||
|
|
"\t\n",
|
|||
|
|
"\t-- SAME-LISTING, OTHER BOOKING INTERACTIONS: PRIOR TO TCI (KNOWN)\n",
|
|||
|
|
"\tknown_bookings_checking_in_prior_to_TCI.listing_known_check_ins_prior_to_TCI_in_30_days,\n",
|
|||
|
|
"\tknown_bookings_checking_in_prior_to_TCI.listing_known_occupancy_prior_to_TCI_in_30_days,\n",
|
|||
|
|
"\tknown_bookings_checking_in_prior_to_TCI.lead_time_between_prior_known_check_in_to_TCI_30_days,\n",
|
|||
|
|
"\t\n",
|
|||
|
|
"\t-- SAME-LISTING, OTHER BOOKING INTERACTIONS: INCIDENTAL BOOKINGS\n",
|
|||
|
|
"\tincidents_prior_to_TCP.listing_incidents_prior_to_TCP_in_30_days,\n",
|
|||
|
|
"\t\n",
|
|||
|
|
"\t-- TARGET (BOOLEAN) --\n",
|
|||
|
|
"\tbooking_summary.has_resolution_incident\n",
|
|||
|
|
"\n",
|
|||
|
|
"FROM\n",
|
|||
|
|
"\tintermediate.int_booking_summary booking_summary\n",
|
|||
|
|
"LEFT JOIN service_information \n",
|
|||
|
|
"\tON\n",
|
|||
|
|
"\tbooking_summary.id_booking = service_information.id_booking\n",
|
|||
|
|
"LEFT JOIN listing_information \n",
|
|||
|
|
"\tON booking_summary.id_accommodation = listing_information.id_accommodation\n",
|
|||
|
|
"LEFT JOIN bookings_checked_in_prior_to_TCR\n",
|
|||
|
|
"\tON booking_summary.id_booking = bookings_checked_in_prior_to_TCR.id_booking\n",
|
|||
|
|
"LEFT JOIN known_bookings_checking_in_prior_to_TCI\n",
|
|||
|
|
"\tON booking_summary.id_booking = known_bookings_checking_in_prior_to_TCI.id_booking\n",
|
|||
|
|
"LEFT JOIN incidents_prior_to_TCP\n",
|
|||
|
|
"\tON booking_summary.id_booking = incidents_prior_to_TCP.id_booking\n",
|
|||
|
|
"WHERE\n",
|
|||
|
|
"\t-- 1. Bookings from New Dash users with Id Deal\n",
|
|||
|
|
"\tbooking_summary.is_user_in_new_dash = TRUE\n",
|
|||
|
|
"\tAND \n",
|
|||
|
|
" booking_summary.is_missing_id_deal = FALSE\n",
|
|||
|
|
"\tAND\n",
|
|||
|
|
"\t-- 2. Protected Bookings with a Protection or a Deposit Management service\n",
|
|||
|
|
" (\n",
|
|||
|
|
"\t\tbooking_summary.has_protection_service_business_type\n",
|
|||
|
|
"\t\t\tOR \n",
|
|||
|
|
" booking_summary.has_deposit_management_service_business_type\n",
|
|||
|
|
"\t)\n",
|
|||
|
|
"\tAND\n",
|
|||
|
|
"\t-- 3. Bookings with flagging categorisation (this excludes Cancelled/Incomplete/Rejected bookings)\n",
|
|||
|
|
"\tbooking_summary.is_booking_flagged_as_risk IS NOT NULL\n",
|
|||
|
|
"\tAND\n",
|
|||
|
|
"\t-- 4. Booking is completed\n",
|
|||
|
|
"\tbooking_summary.is_booking_past_completion_date = TRUE\n",
|
|||
|
|
"\n",
|
|||
|
|
"\n",
|
|||
|
|
"\"\"\"\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Retrieve Data from Query\n",
|
|||
|
|
"df_extraction = query_to_dataframe(engine=dwh_pg_engine, query=data_extraction_query)\n",
|
|||
|
|
"print(f\"Total Bookings: {len(df_extraction):,}\")\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Preprocessing\n",
|
|||
|
|
"Preprocessing in this notebook is quite straight-forward: we just drop id booking and split the features and target."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 25,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"# Drop ID column\n",
|
|||
|
|
"df = df_extraction.copy().drop(columns=['id_booking'])\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Separate features and target\n",
|
|||
|
|
"target_col = 'has_resolution_incident'\n",
|
|||
|
|
"X = df.drop(columns=[target_col])\n",
|
|||
|
|
"y = df[target_col]\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Exploratory Data Analysis\n",
|
|||
|
|
"In this section we focus on explore the different features."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### EDA - Dataset Overview"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 26,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Shape: (21384, 25)\n",
|
|||
|
|
"has_resolution_incident\n",
|
|||
|
|
"False 98.8\n",
|
|||
|
|
"True 1.2\n",
|
|||
|
|
"Name: proportion, dtype: float64\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Shape and types\n",
|
|||
|
|
"print(f\"Shape: {X.shape}\")\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Target distribution\n",
|
|||
|
|
"print(round(100*df[target_col].value_counts(normalize=True),2))\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 27,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"text/html": [
|
|||
|
|
"<div>\n",
|
|||
|
|
"<style scoped>\n",
|
|||
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
|
" vertical-align: middle;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe tbody tr th {\n",
|
|||
|
|
" vertical-align: top;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe thead th {\n",
|
|||
|
|
" text-align: right;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"</style>\n",
|
|||
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
|
" <thead>\n",
|
|||
|
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
|
" <th></th>\n",
|
|||
|
|
" <th>count</th>\n",
|
|||
|
|
" <th>mean</th>\n",
|
|||
|
|
" <th>std</th>\n",
|
|||
|
|
" <th>min</th>\n",
|
|||
|
|
" <th>5%</th>\n",
|
|||
|
|
" <th>25%</th>\n",
|
|||
|
|
" <th>50%</th>\n",
|
|||
|
|
" <th>75%</th>\n",
|
|||
|
|
" <th>95%</th>\n",
|
|||
|
|
" <th>99%</th>\n",
|
|||
|
|
" <th>max</th>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </thead>\n",
|
|||
|
|
" <tbody>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>number_of_applied_upgraded_services</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>2.664282</td>\n",
|
|||
|
|
" <td>1.532038</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>5.0</td>\n",
|
|||
|
|
" <td>6.0</td>\n",
|
|||
|
|
" <td>7.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>number_of_applied_billable_services</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>1.842780</td>\n",
|
|||
|
|
" <td>0.946184</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>5.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>number_of_applied_screening_services</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>2.007903</td>\n",
|
|||
|
|
" <td>0.985649</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>3.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>number_of_applied_deposit_management_services</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>0.620651</td>\n",
|
|||
|
|
" <td>0.485814</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>number_of_applied_protection_services</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>0.727132</td>\n",
|
|||
|
|
" <td>0.445444</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>listing_number_of_bedrooms</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>2.049476</td>\n",
|
|||
|
|
" <td>1.755499</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>3.0</td>\n",
|
|||
|
|
" <td>5.0</td>\n",
|
|||
|
|
" <td>8.0</td>\n",
|
|||
|
|
" <td>15.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>listing_number_of_bathrooms</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>1.590816</td>\n",
|
|||
|
|
" <td>1.312573</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>6.0</td>\n",
|
|||
|
|
" <td>17.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>booking_lead_time</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>18.151422</td>\n",
|
|||
|
|
" <td>24.349579</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>9.0</td>\n",
|
|||
|
|
" <td>25.0</td>\n",
|
|||
|
|
" <td>69.0</td>\n",
|
|||
|
|
" <td>113.0</td>\n",
|
|||
|
|
" <td>220.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>booking_duration</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>4.175084</td>\n",
|
|||
|
|
" <td>4.851055</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>3.0</td>\n",
|
|||
|
|
" <td>5.0</td>\n",
|
|||
|
|
" <td>10.0</td>\n",
|
|||
|
|
" <td>28.0</td>\n",
|
|||
|
|
" <td>116.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>listing_check_ins_prior_to_tcr_in_30_days</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>2.481107</td>\n",
|
|||
|
|
" <td>2.804436</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>8.0</td>\n",
|
|||
|
|
" <td>11.0</td>\n",
|
|||
|
|
" <td>25.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>listing_occupancy_prior_to_tcr_in_30_days</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>8.780817</td>\n",
|
|||
|
|
" <td>9.260855</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>6.0</td>\n",
|
|||
|
|
" <td>16.0</td>\n",
|
|||
|
|
" <td>27.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>listing_known_check_ins_prior_to_tci_in_30_days</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>2.661149</td>\n",
|
|||
|
|
" <td>2.937777</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>4.0</td>\n",
|
|||
|
|
" <td>8.0</td>\n",
|
|||
|
|
" <td>12.0</td>\n",
|
|||
|
|
" <td>26.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>listing_known_occupancy_prior_to_tci_in_30_days</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>9.470913</td>\n",
|
|||
|
|
" <td>9.715511</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>6.0</td>\n",
|
|||
|
|
" <td>17.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>lead_time_between_prior_known_check_in_to_tci_30_days</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>15.287318</td>\n",
|
|||
|
|
" <td>11.424657</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>2.0</td>\n",
|
|||
|
|
" <td>5.0</td>\n",
|
|||
|
|
" <td>11.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" <td>30.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>listing_incidents_prior_to_tcp_in_30_days</th>\n",
|
|||
|
|
" <td>21384.0</td>\n",
|
|||
|
|
" <td>0.013468</td>\n",
|
|||
|
|
" <td>0.130493</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>0.0</td>\n",
|
|||
|
|
" <td>1.0</td>\n",
|
|||
|
|
" <td>3.0</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </tbody>\n",
|
|||
|
|
"</table>\n",
|
|||
|
|
"</div>"
|
|||
|
|
],
|
|||
|
|
"text/plain": [
|
|||
|
|
" count mean \\\n",
|
|||
|
|
"number_of_applied_upgraded_services 21384.0 2.664282 \n",
|
|||
|
|
"number_of_applied_billable_services 21384.0 1.842780 \n",
|
|||
|
|
"number_of_applied_screening_services 21384.0 2.007903 \n",
|
|||
|
|
"number_of_applied_deposit_management_services 21384.0 0.620651 \n",
|
|||
|
|
"number_of_applied_protection_services 21384.0 0.727132 \n",
|
|||
|
|
"listing_number_of_bedrooms 21384.0 2.049476 \n",
|
|||
|
|
"listing_number_of_bathrooms 21384.0 1.590816 \n",
|
|||
|
|
"booking_lead_time 21384.0 18.151422 \n",
|
|||
|
|
"booking_duration 21384.0 4.175084 \n",
|
|||
|
|
"listing_check_ins_prior_to_tcr_in_30_days 21384.0 2.481107 \n",
|
|||
|
|
"listing_occupancy_prior_to_tcr_in_30_days 21384.0 8.780817 \n",
|
|||
|
|
"listing_known_check_ins_prior_to_tci_in_30_days 21384.0 2.661149 \n",
|
|||
|
|
"listing_known_occupancy_prior_to_tci_in_30_days 21384.0 9.470913 \n",
|
|||
|
|
"lead_time_between_prior_known_check_in_to_tci_3... 21384.0 15.287318 \n",
|
|||
|
|
"listing_incidents_prior_to_tcp_in_30_days 21384.0 0.013468 \n",
|
|||
|
|
"\n",
|
|||
|
|
" std min 5% 25% \\\n",
|
|||
|
|
"number_of_applied_upgraded_services 1.532038 1.0 1.0 1.0 \n",
|
|||
|
|
"number_of_applied_billable_services 0.946184 0.0 1.0 1.0 \n",
|
|||
|
|
"number_of_applied_screening_services 0.985649 1.0 1.0 1.0 \n",
|
|||
|
|
"number_of_applied_deposit_management_services 0.485814 0.0 0.0 0.0 \n",
|
|||
|
|
"number_of_applied_protection_services 0.445444 0.0 0.0 0.0 \n",
|
|||
|
|
"listing_number_of_bedrooms 1.755499 0.0 0.0 1.0 \n",
|
|||
|
|
"listing_number_of_bathrooms 1.312573 0.0 0.0 1.0 \n",
|
|||
|
|
"booking_lead_time 24.349579 0.0 0.0 2.0 \n",
|
|||
|
|
"booking_duration 4.851055 0.0 1.0 2.0 \n",
|
|||
|
|
"listing_check_ins_prior_to_tcr_in_30_days 2.804436 0.0 0.0 0.0 \n",
|
|||
|
|
"listing_occupancy_prior_to_tcr_in_30_days 9.260855 0.0 0.0 0.0 \n",
|
|||
|
|
"listing_known_check_ins_prior_to_tci_in_30_days 2.937777 0.0 0.0 0.0 \n",
|
|||
|
|
"listing_known_occupancy_prior_to_tci_in_30_days 9.715511 0.0 0.0 0.0 \n",
|
|||
|
|
"lead_time_between_prior_known_check_in_to_tci_3... 11.424657 1.0 2.0 5.0 \n",
|
|||
|
|
"listing_incidents_prior_to_tcp_in_30_days 0.130493 0.0 0.0 0.0 \n",
|
|||
|
|
"\n",
|
|||
|
|
" 50% 75% 95% 99% \\\n",
|
|||
|
|
"number_of_applied_upgraded_services 2.0 4.0 5.0 6.0 \n",
|
|||
|
|
"number_of_applied_billable_services 2.0 2.0 4.0 4.0 \n",
|
|||
|
|
"number_of_applied_screening_services 2.0 3.0 4.0 4.0 \n",
|
|||
|
|
"number_of_applied_deposit_management_services 1.0 1.0 1.0 1.0 \n",
|
|||
|
|
"number_of_applied_protection_services 1.0 1.0 1.0 1.0 \n",
|
|||
|
|
"listing_number_of_bedrooms 2.0 3.0 5.0 8.0 \n",
|
|||
|
|
"listing_number_of_bathrooms 1.0 2.0 4.0 6.0 \n",
|
|||
|
|
"booking_lead_time 9.0 25.0 69.0 113.0 \n",
|
|||
|
|
"booking_duration 3.0 5.0 10.0 28.0 \n",
|
|||
|
|
"listing_check_ins_prior_to_tcr_in_30_days 2.0 4.0 8.0 11.0 \n",
|
|||
|
|
"listing_occupancy_prior_to_tcr_in_30_days 6.0 16.0 27.0 30.0 \n",
|
|||
|
|
"listing_known_check_ins_prior_to_tci_in_30_days 2.0 4.0 8.0 12.0 \n",
|
|||
|
|
"listing_known_occupancy_prior_to_tci_in_30_days 6.0 17.0 30.0 30.0 \n",
|
|||
|
|
"lead_time_between_prior_known_check_in_to_tci_3... 11.0 30.0 30.0 30.0 \n",
|
|||
|
|
"listing_incidents_prior_to_tcp_in_30_days 0.0 0.0 0.0 1.0 \n",
|
|||
|
|
"\n",
|
|||
|
|
" max \n",
|
|||
|
|
"number_of_applied_upgraded_services 7.0 \n",
|
|||
|
|
"number_of_applied_billable_services 5.0 \n",
|
|||
|
|
"number_of_applied_screening_services 4.0 \n",
|
|||
|
|
"number_of_applied_deposit_management_services 2.0 \n",
|
|||
|
|
"number_of_applied_protection_services 1.0 \n",
|
|||
|
|
"listing_number_of_bedrooms 15.0 \n",
|
|||
|
|
"listing_number_of_bathrooms 17.0 \n",
|
|||
|
|
"booking_lead_time 220.0 \n",
|
|||
|
|
"booking_duration 116.0 \n",
|
|||
|
|
"listing_check_ins_prior_to_tcr_in_30_days 25.0 \n",
|
|||
|
|
"listing_occupancy_prior_to_tcr_in_30_days 30.0 \n",
|
|||
|
|
"listing_known_check_ins_prior_to_tci_in_30_days 26.0 \n",
|
|||
|
|
"listing_known_occupancy_prior_to_tci_in_30_days 30.0 \n",
|
|||
|
|
"lead_time_between_prior_known_check_in_to_tci_3... 30.0 \n",
|
|||
|
|
"listing_incidents_prior_to_tcp_in_30_days 3.0 "
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"text/html": [
|
|||
|
|
"<div>\n",
|
|||
|
|
"<style scoped>\n",
|
|||
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
|
" vertical-align: middle;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe tbody tr th {\n",
|
|||
|
|
" vertical-align: top;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe thead th {\n",
|
|||
|
|
" text-align: right;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"</style>\n",
|
|||
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
|
" <thead>\n",
|
|||
|
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
|
" <th></th>\n",
|
|||
|
|
" <th>count</th>\n",
|
|||
|
|
" <th>unique</th>\n",
|
|||
|
|
" <th>top</th>\n",
|
|||
|
|
" <th>freq</th>\n",
|
|||
|
|
" <th>freq/count</th>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </thead>\n",
|
|||
|
|
" <tbody>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_waiver_pro</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>19082</td>\n",
|
|||
|
|
" <td>0.892349</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_guest_facing_waiver_or_deposit</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>True</td>\n",
|
|||
|
|
" <td>10970</td>\n",
|
|||
|
|
" <td>0.513</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_guest_agreement</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>14787</td>\n",
|
|||
|
|
" <td>0.691498</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_basic_protection</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>11894</td>\n",
|
|||
|
|
" <td>0.55621</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_protection_plus</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>20083</td>\n",
|
|||
|
|
" <td>0.93916</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_protection_pro</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>16626</td>\n",
|
|||
|
|
" <td>0.777497</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_id_verification</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>12438</td>\n",
|
|||
|
|
" <td>0.58165</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_screening_plus</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>11001</td>\n",
|
|||
|
|
" <td>0.51445</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_sex_offender_check</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>19158</td>\n",
|
|||
|
|
" <td>0.895903</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>is_contactless_booking</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>13185</td>\n",
|
|||
|
|
" <td>0.616582</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>has_resolution_incident</th>\n",
|
|||
|
|
" <td>21384</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>False</td>\n",
|
|||
|
|
" <td>21127</td>\n",
|
|||
|
|
" <td>0.987982</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </tbody>\n",
|
|||
|
|
"</table>\n",
|
|||
|
|
"</div>"
|
|||
|
|
],
|
|||
|
|
"text/plain": [
|
|||
|
|
" count unique top freq freq/count\n",
|
|||
|
|
"has_waiver_pro 21384 2 False 19082 0.892349\n",
|
|||
|
|
"has_guest_facing_waiver_or_deposit 21384 2 True 10970 0.513\n",
|
|||
|
|
"has_guest_agreement 21384 2 False 14787 0.691498\n",
|
|||
|
|
"has_basic_protection 21384 2 False 11894 0.55621\n",
|
|||
|
|
"has_protection_plus 21384 2 False 20083 0.93916\n",
|
|||
|
|
"has_protection_pro 21384 2 False 16626 0.777497\n",
|
|||
|
|
"has_id_verification 21384 2 False 12438 0.58165\n",
|
|||
|
|
"has_screening_plus 21384 2 False 11001 0.51445\n",
|
|||
|
|
"has_sex_offender_check 21384 2 False 19158 0.895903\n",
|
|||
|
|
"is_contactless_booking 21384 2 False 13185 0.616582\n",
|
|||
|
|
"has_resolution_incident 21384 2 False 21127 0.987982"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Summary statistics for numerical features\n",
|
|||
|
|
"display(df.describe(include= ['number'], percentiles=[.05,.25,.5,.75,.95,.99]).T)\n",
|
|||
|
|
"# Summary statistics for boolean features\n",
|
|||
|
|
"summary = df.describe(include= ['bool']).T\n",
|
|||
|
|
"summary['freq/count'] = summary['freq']/summary['count']\n",
|
|||
|
|
"display(summary)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 28,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABncAAAWxCAYAAABEBcfHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzddXRUx9vA8e/GXQkxiCvuENwdSlscirfQ4g7FocW1pbRIseLubm2x4u7BIQJR4rbvH4FNluzGSn8pb5/POXsO3J2Zfe5czZ07MwqlUqlECCGEEEIIIYQQQgghhBBCfBR0CjoAIYQQQgghhBBCCCGEEEIIkXvSuCOEEEIIIYQQQgghhBBCCPERkcYdIYQQQgghhBBCCCGEEEKIj4g07gghhBBCCCGEEEIIIYQQQnxEpHFHCCGEEEIIIYQQQgghhBDiIyKNO0IIIYQQQgghhBBCCCGEEB8RadwRQgghhBBCCCGEEEIIIYT4iEjjjhBCCCGEEEIIIYQQQgghxEdEGneEEEIIIYQQQgghhBBCCCE+ItK4I4QQQgghhBDig1q5ciUKhYLHjx9/sDIfP36MQqFg5cqVH6zMj13t2rWpXbt2QYchhBBCCCEKgDTuCCGEEEIIIcRHIDAwkN69e+Ph4YGRkREWFhZUq1aNBQsWEB8fX9DhfTDr1q1j/vz5BR2Gmm7duqFQKLCwsNBY1/fv30ehUKBQKJg9e3aey3/58iUTJ07kypUrHyBaIYQQQgjxX6BX0AEIIYQQQgghhMje3r17adOmDYaGhnTp0oUSJUqQlJTEyZMnGT58ODdv3mTJkiUFHeYHsW7dOm7cuMGgQYPUlru6uhIfH4++vn6BxKWnp0dcXBy7d++mbdu2at+tXbsWIyMjEhIS8lX2y5cvmTRpEm5ubpQpUybX+Q4dOpSv3xNCCCGEEB8/adwRQgghhBBCiH+xR48e0b59e1xdXTl27BiOjo6q7/r27cuDBw/Yu3fv3/4dpVJJQkICxsbGWb5LSEjAwMAAHZ2CG/xBoVBgZGRUYL9vaGhItWrVWL9+fZbGnXXr1tGsWTO2bt36P4klLi4OExMTDAwM/ie/J4QQQggh/n1kWDYhhBBCCCGE+BebOXMmMTEx/Prrr2oNO+94eXkxcOBA1f9TUlKYMmUKnp6eGBoa4ubmxrfffktiYqJaPjc3N5o3b87BgwepUKECxsbGLF68mBMnTqBQKNiwYQNjx47F2dkZExMToqOjAfjrr79o3LgxlpaWmJiYUKtWLU6dOpXjeuzcuZNmzZrh5OSEoaEhnp6eTJkyhdTUVFWa2rVrs3fvXp48eaIa5szNzQ3QPufOsWPHqFGjBqamplhZWfHJJ59w+/ZttTQTJ05EoVDw4MEDunXrhpWVFZaWlnTv3p24uLgcY3+nY8eO7N+/n8jISNWy8+fPc//+fTp27JglfXh4OMOGDaNkyZKYmZlhYWFBkyZNuHr1qirNiRMnqFixIgDdu3dXrfe79axduzYlSpTg4sWL1KxZExMTE7799lvVd5nn3OnatStGRkZZ1r9Ro0ZYW1vz8uXLXK+rEEIIIYT4d5OeO0IIIYQQQgjxL7Z79248PDyoWrVqrtL36tWLVatW0bp1a4YOHcpff/3FtGnTuH37Ntu3b1dLe/fuXTp06EDv3r358ssv8fX1VX03ZcoUDAwMGDZsGImJiRgYGHDs2DGaNGlC+fLlmTBhAjo6OqxYsYK6devy559/UqlSJa1xrVy5EjMzM4YMGYKZmRnHjh1j/PjxREdHM2vWLADGjBlDVFQUz58/Z968eQCYmZlpLfPIkSM0adIEDw8PJk6cSHx8PD/++CPVqlXj0qVLqoahd9q2bYu7uzvTpk3j0qVLLFu2jMKFCzNjxoxc1e1nn31Gnz592LZtGz169ADSe+34+flRrly5LOkfPnzIjh07aNOmDe7u7oSEhLB48WJq1arFrVu3cHJywt/fn8mTJzN+/Hi++uoratSoAaC2vcPCwmjSpAnt27enc+fO2Nvba4xvwYIFHDt2jK5du3LmzBl0dXVZvHgxhw4d4rfffsPJySlX6ymEEEIIIT4CSiGEEEIIIYQQ/0pRUVFKQPnJJ5/kKv2VK1eUgLJXr15qy4cNG6YElMeOHVMtc3V1VQLKAwcOqKU9fvy4ElB6eHgo4+LiVMvT0tKU3t7eykaNGinT0tJUy+Pi4pTu7u7KBg0aqJatWLFCCSgfPXqklu59vXv3VpqYmCgTEhJUy5o1a6Z0dXXNkvbRo0dKQLlixQrVsjJlyigLFy6sDAsLUy27evWqUkdHR9mlSxfVsgkTJigBZY8ePdTK/PTTT5W2trZZfut9Xbt2VZqamiqVSqWydevWynr16imVSqUyNTVV6eDgoJw0aZIqvlmzZqnyJSQkKFNTU7Osh6GhoXLy5MmqZefPn8+ybu/UqlVLCSh/+eUXjd/VqlVLbdnBgweVgPK7775TPnz4UGlmZqZs1apVjusohBBCCCE+LjIsmxBCCCGEEEL8S70bCs3c3DxX6fft2wfAkCFD1JYPHToUIMvcPO7u7jRq1EhjWV27dlWbf+fKlSuq4cfCwsJ4/fo1r1+/JjY2lnr16vHHH3+QlpamNbbMZb1584bXr19To0YN4uLiuHPnTq7WL7OgoCCuXLlCt27dsLGxUS0vVaoUDRo0UNVFZn369FH7f40aNQgLC1PVc2507NiREydOEBwczLFjxwgODtY4JBukz9Pzbp6i1NRUwsLCMDMzw9fXl0uXLuX6Nw0NDenevXuu0jZs2JDevXszefJkPvvsM4yMjFi8eHGuf0sIIYQQQnwcZFg2IYQQQgghhPiXsrCwANIbQ3LjyZMn6Ojo4OXlpbbcwcEBKysrnjx5orbc3d1da1nvf3f//n0gvdFHm6ioKKytrTV+d/PmTcaOHcuxY8eyNKZERUVpLVObd+uSeSi5d/z9/Tl48CCxsbGYmpqqlru4uKilexdrRESEqq5z0rRpU8zNzdm4cSNXrlyhYsWKeHl58fjx4yxp09LSWLBgAYsWLeLRo0dq8wvZ2trm6vcAnJ2dMTAwyHX62bNns3PnTq5cucK6desoXLhwrvMKIYQQQoiPgzTuCCGEEEIIIcS/lIWFBU5OTty4cSNP+RQKRa7SZe5Nk9N373rlzJo1izJlymjMo21+nMjISGrVqoWFhQWTJ0/G09MTIyMjLl26xMiRI7Pt8fMh6erqalyuVCpzXYahoSGfffYZq1at4uHDh0ycOFFr2qlTpzJu3Dh69OjBlClTsLGxQUdHh0GDBuVpnbPbTppcvnyZ0NBQAK5fv06HDh3ylF8IIYQQQvz7SeOOEEIIIYQQQvyLNW/enCVLlnDmzBkCAgKyTevq6kpaWhr379/H399ftTwkJITIyEhcXV3zHYenpyeQ3uBUv379POU9ceIEYWFhbNu2jZo1a6qWP3r0KEva3DZMvVuXu3fvZvnuzp07FCpUSK3XzofUsWNHli9fjo6ODu3bt9eabsuWLdSpU4dff/1VbXlkZCSFChVS/T+365wbsbGxdO/enWLFilG1alVmzpzJp59+SsWKFT/YbwghhBBCiIInc+4IIYQQQgghxL/YiBEjMDU1pVevXoSEhGT5PjAwkAULFgDpQ4YBzJ8/Xy3N3LlzAWjWrFm+4yhfvjyenp7Mnj2bmJiYLN+/evVKa953PWYy95BJSkpi0aJFWdKamprmapg2R0dHypQpw6pVq4iMjFQtv3HjBocOHVLVxT+hTp06TJkyhYULF+Lg4KA1na6ubpZeQZs3b+bFixdqy941QmVej/waOXIkT58+ZdWqVcydOxc3Nze6du1KYmLi3y5bCCGEEEL8e0jPHSGEEEIIIYT4F/P09GTdunW0a9cOf39/unTpQokSJUhKSuL06dNs3ryZbt26AVC6dGm6du3KkiVLVEOhnTt3jlWrVtGqVSvq1KmT7zh0dHRYtmwZTZo0oXjx4nTv3h1nZ2devHjB8ePHsbCwYPfu3RrzVq1aFWtra7p27cqAAQNQKBT89ttvGodDK1++PBs3bmTIkCFUrFgRMzMzWrR
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 1700x1300 with 2 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Correlation heatmap\n",
|
|||
|
|
"plt.figure(figsize=(17, 13))\n",
|
|||
|
|
"cmap = sns.diverging_palette(220, 20, as_cmap=True)\n",
|
|||
|
|
"sns.heatmap(df.corr(), annot=True, cmap=cmap, fmt=\".2f\", linewidths=.5,)\n",
|
|||
|
|
"plt.title(\"Correlation Matrix\")\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Processing for modelling\n",
|
|||
|
|
"Afterwards, we split the dataset between train and test and display their sizes and target distribution."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 29,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Training set size: 14968 rows\n",
|
|||
|
|
"Test set size: 6416 rows\n",
|
|||
|
|
"\n",
|
|||
|
|
"Training target distribution:\n",
|
|||
|
|
"has_resolution_incident\n",
|
|||
|
|
"False 0.98744\n",
|
|||
|
|
"True 0.01256\n",
|
|||
|
|
"Name: proportion, dtype: float64\n",
|
|||
|
|
"\n",
|
|||
|
|
"Test target distribution:\n",
|
|||
|
|
"has_resolution_incident\n",
|
|||
|
|
"False 0.989246\n",
|
|||
|
|
"True 0.010754\n",
|
|||
|
|
"Name: proportion, dtype: float64\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Split the data\n",
|
|||
|
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)\n",
|
|||
|
|
"\n",
|
|||
|
|
"print(f\"Training set size: {X_train.shape[0]} rows\")\n",
|
|||
|
|
"print(f\"Test set size: {X_test.shape[0]} rows\")\n",
|
|||
|
|
"\n",
|
|||
|
|
"print(\"\\nTraining target distribution:\")\n",
|
|||
|
|
"print(y_train.value_counts(normalize=True))\n",
|
|||
|
|
"\n",
|
|||
|
|
"print(\"\\nTest target distribution:\")\n",
|
|||
|
|
"print(y_test.value_counts(normalize=True))"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "d36c9276",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Classification Model with Random Forest\n",
|
|||
|
|
"\n",
|
|||
|
|
"We define a machine learning pipeline that includes:\n",
|
|||
|
|
"- **Scaling numeric features** with `StandardScaler`\n",
|
|||
|
|
"- **Training a Random Forest classifier** with balanced class weights to handle the imbalanced dataset\n",
|
|||
|
|
"\n",
|
|||
|
|
"We then use `GridSearchCV` to perform a **grid search with cross-validation** over a range of key hyperparameters (e.g., number of trees, max depth, etc.). \n",
|
|||
|
|
"The model is evaluated using **Average Precision**, which is better suited for imbalanced classification tasks.\n",
|
|||
|
|
"\n",
|
|||
|
|
"The best combination of parameters is selected, and the resulting model is used to make predictions on the test set.\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 30,
|
|||
|
|
"id": "943ef7d6",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Fitting 4 folds for each of 72 candidates, totalling 288 fits\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 7.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 7.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 6.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 6.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 7.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 7.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 6.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 4.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 6.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 4.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.6s[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.8s[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.6s[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.5s\n",
|
|||
|
|
"\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 6.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 6.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 6.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 5.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 6.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.2s\n",
|
|||
|
|
"Best hyperparameters: {'model__max_depth': 10, 'model__max_features': 'sqrt', 'model__min_samples_leaf': 2, 'model__min_samples_split': 5, 'model__n_estimators': 300}\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"\n",
|
|||
|
|
"# Define pipeline (scaling numeric features only)\n",
|
|||
|
|
"pipeline = Pipeline([\n",
|
|||
|
|
" ('scaler', StandardScaler()),\n",
|
|||
|
|
" ('model', RandomForestClassifier(class_weight='balanced', # We have an imbalanced dataset\n",
|
|||
|
|
" random_state=123))\n",
|
|||
|
|
"])\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Define parameter grid\n",
|
|||
|
|
"param_grid = {\n",
|
|||
|
|
" 'model__n_estimators': [100, 200, 300],\n",
|
|||
|
|
" 'model__max_depth': [None, 10, 20],\n",
|
|||
|
|
" 'model__min_samples_split': [2, 5],\n",
|
|||
|
|
" 'model__min_samples_leaf': [1, 2],\n",
|
|||
|
|
" 'model__max_features': ['sqrt', 'log2']\n",
|
|||
|
|
"}\n",
|
|||
|
|
"\n",
|
|||
|
|
"# GridSearchCV\n",
|
|||
|
|
"grid_search = GridSearchCV(\n",
|
|||
|
|
" estimator=pipeline,\n",
|
|||
|
|
" param_grid=param_grid,\n",
|
|||
|
|
" scoring='average_precision', # For imbalanced classification\n",
|
|||
|
|
" cv=4, # 4-fold cross-validation\n",
|
|||
|
|
" n_jobs=-1, # Use all available cores\n",
|
|||
|
|
" verbose=2, # Verbose output for progress tracking,\n",
|
|||
|
|
" refit=True # Refit the best model on the entire training set - it's already true by default\n",
|
|||
|
|
")\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Fit the grid search on training data\n",
|
|||
|
|
"grid_search.fit(X_train, y_train)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Best model\n",
|
|||
|
|
"best_pipeline = grid_search.best_estimator_\n",
|
|||
|
|
"print(\"Best hyperparameters:\", grid_search.best_params_)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Predict on test set\n",
|
|||
|
|
"y_pred_proba = best_pipeline.predict_proba(X_test)[:, 1]\n",
|
|||
|
|
"y_pred = best_pipeline.predict(X_test)\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 31,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"text/html": [
|
|||
|
|
"<div>\n",
|
|||
|
|
"<style scoped>\n",
|
|||
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
|
" vertical-align: middle;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe tbody tr th {\n",
|
|||
|
|
" vertical-align: top;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe thead th {\n",
|
|||
|
|
" text-align: right;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"</style>\n",
|
|||
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
|
" <thead>\n",
|
|||
|
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
|
" <th></th>\n",
|
|||
|
|
" <th>mean_fit_time</th>\n",
|
|||
|
|
" <th>std_fit_time</th>\n",
|
|||
|
|
" <th>mean_score_time</th>\n",
|
|||
|
|
" <th>std_score_time</th>\n",
|
|||
|
|
" <th>param_model__max_depth</th>\n",
|
|||
|
|
" <th>param_model__max_features</th>\n",
|
|||
|
|
" <th>param_model__min_samples_leaf</th>\n",
|
|||
|
|
" <th>param_model__min_samples_split</th>\n",
|
|||
|
|
" <th>param_model__n_estimators</th>\n",
|
|||
|
|
" <th>params</th>\n",
|
|||
|
|
" <th>split0_test_score</th>\n",
|
|||
|
|
" <th>split1_test_score</th>\n",
|
|||
|
|
" <th>split2_test_score</th>\n",
|
|||
|
|
" <th>split3_test_score</th>\n",
|
|||
|
|
" <th>mean_test_score</th>\n",
|
|||
|
|
" <th>std_test_score</th>\n",
|
|||
|
|
" <th>rank_test_score</th>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </thead>\n",
|
|||
|
|
" <tbody>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>35</th>\n",
|
|||
|
|
" <td>5.492363</td>\n",
|
|||
|
|
" <td>0.074103</td>\n",
|
|||
|
|
" <td>0.193978</td>\n",
|
|||
|
|
" <td>0.016560</td>\n",
|
|||
|
|
" <td>10</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>5</td>\n",
|
|||
|
|
" <td>300</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 10, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.041262</td>\n",
|
|||
|
|
" <td>0.021222</td>\n",
|
|||
|
|
" <td>0.028958</td>\n",
|
|||
|
|
" <td>0.058779</td>\n",
|
|||
|
|
" <td>0.037555</td>\n",
|
|||
|
|
" <td>0.014185</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>22</th>\n",
|
|||
|
|
" <td>3.078427</td>\n",
|
|||
|
|
" <td>0.037090</td>\n",
|
|||
|
|
" <td>0.129033</td>\n",
|
|||
|
|
" <td>0.003915</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>5</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.046899</td>\n",
|
|||
|
|
" <td>0.023721</td>\n",
|
|||
|
|
" <td>0.029079</td>\n",
|
|||
|
|
" <td>0.049230</td>\n",
|
|||
|
|
" <td>0.037232</td>\n",
|
|||
|
|
" <td>0.011028</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>54</th>\n",
|
|||
|
|
" <td>1.725934</td>\n",
|
|||
|
|
" <td>0.030368</td>\n",
|
|||
|
|
" <td>0.065814</td>\n",
|
|||
|
|
" <td>0.002268</td>\n",
|
|||
|
|
" <td>20</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 20, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.046455</td>\n",
|
|||
|
|
" <td>0.021084</td>\n",
|
|||
|
|
" <td>0.030397</td>\n",
|
|||
|
|
" <td>0.050986</td>\n",
|
|||
|
|
" <td>0.037230</td>\n",
|
|||
|
|
" <td>0.012059</td>\n",
|
|||
|
|
" <td>3</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>23</th>\n",
|
|||
|
|
" <td>4.754896</td>\n",
|
|||
|
|
" <td>0.284760</td>\n",
|
|||
|
|
" <td>0.197159</td>\n",
|
|||
|
|
" <td>0.010598</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>5</td>\n",
|
|||
|
|
" <td>300</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.045281</td>\n",
|
|||
|
|
" <td>0.024624</td>\n",
|
|||
|
|
" <td>0.028884</td>\n",
|
|||
|
|
" <td>0.049424</td>\n",
|
|||
|
|
" <td>0.037053</td>\n",
|
|||
|
|
" <td>0.010511</td>\n",
|
|||
|
|
" <td>4</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>64</th>\n",
|
|||
|
|
" <td>3.150147</td>\n",
|
|||
|
|
" <td>0.123393</td>\n",
|
|||
|
|
" <td>0.133204</td>\n",
|
|||
|
|
" <td>0.010875</td>\n",
|
|||
|
|
" <td>20</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>5</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 20, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.048786</td>\n",
|
|||
|
|
" <td>0.021536</td>\n",
|
|||
|
|
" <td>0.031982</td>\n",
|
|||
|
|
" <td>0.045861</td>\n",
|
|||
|
|
" <td>0.037041</td>\n",
|
|||
|
|
" <td>0.010974</td>\n",
|
|||
|
|
" <td>5</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>...</th>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>1</th>\n",
|
|||
|
|
" <td>3.655133</td>\n",
|
|||
|
|
" <td>0.052994</td>\n",
|
|||
|
|
" <td>0.141072</td>\n",
|
|||
|
|
" <td>0.002776</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.044698</td>\n",
|
|||
|
|
" <td>0.019424</td>\n",
|
|||
|
|
" <td>0.026336</td>\n",
|
|||
|
|
" <td>0.041751</td>\n",
|
|||
|
|
" <td>0.033052</td>\n",
|
|||
|
|
" <td>0.010513</td>\n",
|
|||
|
|
" <td>68</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>49</th>\n",
|
|||
|
|
" <td>3.499403</td>\n",
|
|||
|
|
" <td>0.044126</td>\n",
|
|||
|
|
" <td>0.146713</td>\n",
|
|||
|
|
" <td>0.003312</td>\n",
|
|||
|
|
" <td>20</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 20, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.043488</td>\n",
|
|||
|
|
" <td>0.019535</td>\n",
|
|||
|
|
" <td>0.026128</td>\n",
|
|||
|
|
" <td>0.041667</td>\n",
|
|||
|
|
" <td>0.032705</td>\n",
|
|||
|
|
" <td>0.010165</td>\n",
|
|||
|
|
" <td>69</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>48</th>\n",
|
|||
|
|
" <td>2.029998</td>\n",
|
|||
|
|
" <td>0.085049</td>\n",
|
|||
|
|
" <td>0.118226</td>\n",
|
|||
|
|
" <td>0.019632</td>\n",
|
|||
|
|
" <td>20</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 20, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.040683</td>\n",
|
|||
|
|
" <td>0.018370</td>\n",
|
|||
|
|
" <td>0.026502</td>\n",
|
|||
|
|
" <td>0.038585</td>\n",
|
|||
|
|
" <td>0.031035</td>\n",
|
|||
|
|
" <td>0.009097</td>\n",
|
|||
|
|
" <td>70</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>12</th>\n",
|
|||
|
|
" <td>2.102099</td>\n",
|
|||
|
|
" <td>0.029990</td>\n",
|
|||
|
|
" <td>0.092719</td>\n",
|
|||
|
|
" <td>0.007638</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.035229</td>\n",
|
|||
|
|
" <td>0.020518</td>\n",
|
|||
|
|
" <td>0.024970</td>\n",
|
|||
|
|
" <td>0.039950</td>\n",
|
|||
|
|
" <td>0.030167</td>\n",
|
|||
|
|
" <td>0.007769</td>\n",
|
|||
|
|
" <td>71</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>0</th>\n",
|
|||
|
|
" <td>1.983677</td>\n",
|
|||
|
|
" <td>0.277025</td>\n",
|
|||
|
|
" <td>0.091703</td>\n",
|
|||
|
|
" <td>0.020498</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.037104</td>\n",
|
|||
|
|
" <td>0.016652</td>\n",
|
|||
|
|
" <td>0.023631</td>\n",
|
|||
|
|
" <td>0.034512</td>\n",
|
|||
|
|
" <td>0.027975</td>\n",
|
|||
|
|
" <td>0.008264</td>\n",
|
|||
|
|
" <td>72</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </tbody>\n",
|
|||
|
|
"</table>\n",
|
|||
|
|
"<p>72 rows × 17 columns</p>\n",
|
|||
|
|
"</div>"
|
|||
|
|
],
|
|||
|
|
"text/plain": [
|
|||
|
|
" mean_fit_time std_fit_time mean_score_time std_score_time \\\n",
|
|||
|
|
"35 5.492363 0.074103 0.193978 0.016560 \n",
|
|||
|
|
"22 3.078427 0.037090 0.129033 0.003915 \n",
|
|||
|
|
"54 1.725934 0.030368 0.065814 0.002268 \n",
|
|||
|
|
"23 4.754896 0.284760 0.197159 0.010598 \n",
|
|||
|
|
"64 3.150147 0.123393 0.133204 0.010875 \n",
|
|||
|
|
".. ... ... ... ... \n",
|
|||
|
|
"1 3.655133 0.052994 0.141072 0.002776 \n",
|
|||
|
|
"49 3.499403 0.044126 0.146713 0.003312 \n",
|
|||
|
|
"48 2.029998 0.085049 0.118226 0.019632 \n",
|
|||
|
|
"12 2.102099 0.029990 0.092719 0.007638 \n",
|
|||
|
|
"0 1.983677 0.277025 0.091703 0.020498 \n",
|
|||
|
|
"\n",
|
|||
|
|
" param_model__max_depth param_model__max_features \\\n",
|
|||
|
|
"35 10 sqrt \n",
|
|||
|
|
"22 None log2 \n",
|
|||
|
|
"54 20 sqrt \n",
|
|||
|
|
"23 None log2 \n",
|
|||
|
|
"64 20 log2 \n",
|
|||
|
|
".. ... ... \n",
|
|||
|
|
"1 None sqrt \n",
|
|||
|
|
"49 20 sqrt \n",
|
|||
|
|
"48 20 sqrt \n",
|
|||
|
|
"12 None log2 \n",
|
|||
|
|
"0 None sqrt \n",
|
|||
|
|
"\n",
|
|||
|
|
" param_model__min_samples_leaf param_model__min_samples_split \\\n",
|
|||
|
|
"35 2 5 \n",
|
|||
|
|
"22 2 5 \n",
|
|||
|
|
"54 2 2 \n",
|
|||
|
|
"23 2 5 \n",
|
|||
|
|
"64 1 5 \n",
|
|||
|
|
".. ... ... \n",
|
|||
|
|
"1 1 2 \n",
|
|||
|
|
"49 1 2 \n",
|
|||
|
|
"48 1 2 \n",
|
|||
|
|
"12 1 2 \n",
|
|||
|
|
"0 1 2 \n",
|
|||
|
|
"\n",
|
|||
|
|
" param_model__n_estimators \\\n",
|
|||
|
|
"35 300 \n",
|
|||
|
|
"22 200 \n",
|
|||
|
|
"54 100 \n",
|
|||
|
|
"23 300 \n",
|
|||
|
|
"64 200 \n",
|
|||
|
|
".. ... \n",
|
|||
|
|
"1 200 \n",
|
|||
|
|
"49 200 \n",
|
|||
|
|
"48 100 \n",
|
|||
|
|
"12 100 \n",
|
|||
|
|
"0 100 \n",
|
|||
|
|
"\n",
|
|||
|
|
" params split0_test_score \\\n",
|
|||
|
|
"35 {'model__max_depth': 10, 'model__max_features'... 0.041262 \n",
|
|||
|
|
"22 {'model__max_depth': None, 'model__max_feature... 0.046899 \n",
|
|||
|
|
"54 {'model__max_depth': 20, 'model__max_features'... 0.046455 \n",
|
|||
|
|
"23 {'model__max_depth': None, 'model__max_feature... 0.045281 \n",
|
|||
|
|
"64 {'model__max_depth': 20, 'model__max_features'... 0.048786 \n",
|
|||
|
|
".. ... ... \n",
|
|||
|
|
"1 {'model__max_depth': None, 'model__max_feature... 0.044698 \n",
|
|||
|
|
"49 {'model__max_depth': 20, 'model__max_features'... 0.043488 \n",
|
|||
|
|
"48 {'model__max_depth': 20, 'model__max_features'... 0.040683 \n",
|
|||
|
|
"12 {'model__max_depth': None, 'model__max_feature... 0.035229 \n",
|
|||
|
|
"0 {'model__max_depth': None, 'model__max_feature... 0.037104 \n",
|
|||
|
|
"\n",
|
|||
|
|
" split1_test_score split2_test_score split3_test_score mean_test_score \\\n",
|
|||
|
|
"35 0.021222 0.028958 0.058779 0.037555 \n",
|
|||
|
|
"22 0.023721 0.029079 0.049230 0.037232 \n",
|
|||
|
|
"54 0.021084 0.030397 0.050986 0.037230 \n",
|
|||
|
|
"23 0.024624 0.028884 0.049424 0.037053 \n",
|
|||
|
|
"64 0.021536 0.031982 0.045861 0.037041 \n",
|
|||
|
|
".. ... ... ... ... \n",
|
|||
|
|
"1 0.019424 0.026336 0.041751 0.033052 \n",
|
|||
|
|
"49 0.019535 0.026128 0.041667 0.032705 \n",
|
|||
|
|
"48 0.018370 0.026502 0.038585 0.031035 \n",
|
|||
|
|
"12 0.020518 0.024970 0.039950 0.030167 \n",
|
|||
|
|
"0 0.016652 0.023631 0.034512 0.027975 \n",
|
|||
|
|
"\n",
|
|||
|
|
" std_test_score rank_test_score \n",
|
|||
|
|
"35 0.014185 1 \n",
|
|||
|
|
"22 0.011028 2 \n",
|
|||
|
|
"54 0.012059 3 \n",
|
|||
|
|
"23 0.010511 4 \n",
|
|||
|
|
"64 0.010974 5 \n",
|
|||
|
|
".. ... ... \n",
|
|||
|
|
"1 0.010513 68 \n",
|
|||
|
|
"49 0.010165 69 \n",
|
|||
|
|
"48 0.009097 70 \n",
|
|||
|
|
"12 0.007769 71 \n",
|
|||
|
|
"0 0.008264 72 \n",
|
|||
|
|
"\n",
|
|||
|
|
"[72 rows x 17 columns]"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"execution_count": 31,
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "execute_result"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Retrieve cv results\n",
|
|||
|
|
"pd.DataFrame(grid_search.cv_results_).sort_values(by='mean_test_score', ascending=False)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"We apply a threshold selector to find a proper value for F2 optimisation, rather than defaulting to 0.5."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 32,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"# Find the best threshold for F2 score\n",
|
|||
|
|
"\n",
|
|||
|
|
"def find_best_threshold(y_true, y_proba, beta=2.0):\n",
|
|||
|
|
" thresholds = np.linspace(0, 1, 200)\n",
|
|||
|
|
" f2_scores = []\n",
|
|||
|
|
"\n",
|
|||
|
|
" for t in thresholds:\n",
|
|||
|
|
" preds = (y_proba >= t).astype(int)\n",
|
|||
|
|
" score = fbeta_score(y_true, preds, beta=beta)\n",
|
|||
|
|
" f2_scores.append(score)\n",
|
|||
|
|
"\n",
|
|||
|
|
" best_index = np.argmax(f2_scores)\n",
|
|||
|
|
" return thresholds[best_index], f2_scores[best_index]"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 33,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Best threshold: 38.2% — F2 score: 15.31%\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Predict probabilities\n",
|
|||
|
|
"y_pred_proba = best_pipeline.predict_proba(X_test)[:, 1]\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Find best threshold for F2\n",
|
|||
|
|
"best_thresh, best_f2 = find_best_threshold(y_test, y_pred_proba, beta=2.0)\n",
|
|||
|
|
"print(f\"Best threshold: {100*best_thresh:.1f}% — F2 score: {100*best_f2:.2f}%\")\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Use that threshold for final classification\n",
|
|||
|
|
"y_pred_opt = (y_pred_proba >= best_thresh).astype(int)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "fc2fcc89",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Evaluation\n",
|
|||
|
|
"This section aims to evaluate how good the new model is vs. the actual Resolution Incidents.\n",
|
|||
|
|
"\n",
|
|||
|
|
"We start by computing and displaying the classification report, ROC Curve, PR Curve and the respective Area Under the Curve (AUC)."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 34,
|
|||
|
|
"id": "30786f7c",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
" precision recall f1-score support\n",
|
|||
|
|
"\n",
|
|||
|
|
" No Incident 0.99 0.89 0.94 6347\n",
|
|||
|
|
" Incident 0.04 0.43 0.08 69\n",
|
|||
|
|
"\n",
|
|||
|
|
" accuracy 0.89 6416\n",
|
|||
|
|
" macro avg 0.52 0.66 0.51 6416\n",
|
|||
|
|
"weighted avg 0.98 0.89 0.93 6416\n",
|
|||
|
|
"\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Print classification report\n",
|
|||
|
|
"print(classification_report(y_test, y_pred_opt, target_names=['No Incident', 'Incident']))"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the Classification Report\n",
|
|||
|
|
"\n",
|
|||
|
|
"The **Classification Report** provides key metrics to evaluate how well the model performed on each class.\n",
|
|||
|
|
"\n",
|
|||
|
|
"It includes the following metrics for each class (0 and 1):\n",
|
|||
|
|
"* Precision: Out of all predicted positives, how many were actually positive?\n",
|
|||
|
|
"* Recall: Out of all actual positives, how many did we correctly identify?\n",
|
|||
|
|
"* F1-score: Harmonic mean of precision and recall (balances both)\n",
|
|||
|
|
"* Support: Number of true samples of that class in the test data\n",
|
|||
|
|
"\n",
|
|||
|
|
"Interpretation:\n",
|
|||
|
|
"* Class 0 = No incident\n",
|
|||
|
|
"* Class 1 = Has resolution incident (rare, but important!)\n",
|
|||
|
|
"\n",
|
|||
|
|
"A few explanatory cases:\n",
|
|||
|
|
"* A high recall for class 1 means we're catching most incidents.\n",
|
|||
|
|
"* A high precision for class 1 means when we predict an incident, we're often correct.\n",
|
|||
|
|
"* The F1-score gives a single balanced measure (good for imbalanced data).\n",
|
|||
|
|
"\n",
|
|||
|
|
"Special note for imbalanced data:\n",
|
|||
|
|
"Since class 1 (or just True) is rare (1% in our case), metrics for that class are more critical.\n",
|
|||
|
|
"We want to maximize recall to catch as many real incidents as possible — without letting precision drop too low (to avoid too many false alarms)."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 35,
|
|||
|
|
"id": "4b4da914",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhgAAAHWCAYAAAA1jvBJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABw10lEQVR4nO3dd1gU1/oH8O/uwtKLiEgRRRBiF3vsDUWNvQEaRZOYpom/eE2iibGkaG5MjLmJNyYaY4kCgj1WNNHYol4VS1QsiA1QuaiAlF12z+8PLxuRRVmcZRb4fp6HR/bsmZl3jwv7MvPOOQohhAARERGRhJRyB0BERESVDxMMIiIikhwTDCIiIpIcEwwiIiKSHBMMIiIikhwTDCIiIpIcEwwiIiKSHBMMIiIikhwTDCIiIpIcEwwiIiKSHBMMoipg2bJlUCgUhi8rKyv4+Phg7NixuHnzptFthBBYuXIlOnfuDFdXV9jb26NJkyb4+OOP8eDBgxKPtX79evTp0wfu7u5Qq9Xw9vbGiBEj8Ntvv5Uq1ry8PHz99ddo27YtXFxcYGtri6CgIEycOBEXLlwo0+snovKn4FokRJXfsmXLMG7cOHz88ceoW7cu8vLy8Oeff2LZsmXw8/PDmTNnYGtra+iv0+kwcuRIrFmzBp06dcKQIUNgb2+Pffv2YfXq1WjYsCF27dqFmjVrGrYRQuCll17CsmXL0Lx5cwwbNgyenp5ITU3F+vXrcezYMRw4cADt27cvMc709HT07t0bx44dQ79+/RASEgJHR0ckJiYiOjoaaWlp0Gg0Zh0rIpKIIKJK7+effxYAxNGjR4u0v//++wKAiImJKdI+Z84cAUBMmTKl2L42bdoklEql6N27d5H2efPmCQDi//7v/4Rery+23YoVK8Thw4efGOcLL7wglEqliIuLK/ZcXl6e+Mc//vHE7UtLq9WK/Px8SfZFRMYxwSCqAkpKMH799VcBQMyZM8fQlpOTI6pVqyaCgoKEVqs1ur9x48YJAOLQoUOGbdzc3ET9+vVFQUFBmWL8888/BQAxfvz4UvXv0qWL6NKlS7H2yMhIUadOHcPjK1euCABi3rx54uuvvxb+/v5CqVSKP//8U6hUKjFr1qxi+zh//rwAIL799ltD2927d8WkSZNErVq1hFqtFgEBAeLzzz8XOp3O5NdKVBWwBoOoCktOTgYAVKtWzdC2f/9+3L17FyNHjoSVlZXR7caMGQMA+PXXXw3bZGRkYOTIkVCpVGWKZdOmTQCA0aNHl2n7p/n555/x7bff4tVXX8VXX30FLy8vdOnSBWvWrCnWNyYmBiqVCsOHDwcA5OTkoEuXLvjll18wZswY/Otf/0KHDh0wbdo0TJ482SzxElV0xn97EFGldP/+faSnpyMvLw+HDx/G7NmzYWNjg379+hn6nD17FgDQrFmzEvdT+Ny5c+eK/NukSZMyxybFPp7kxo0buHTpEmrUqGFoCwsLw2uvvYYzZ86gcePGhvaYmBh06dLFUGMyf/58XL58GSdOnEBgYCAA4LXXXoO3tzfmzZuHf/zjH/D19TVL3EQVFc9gEFUhISEhqFGjBnx9fTFs2DA4ODhg06ZNqFWrlqFPVlYWAMDJyanE/RQ+l5mZWeTfJ23zNFLs40mGDh1aJLkAgCFDhsDKygoxMTGGtjNnzuDs2bMICwsztMXGxqJTp06oVq0a0tPTDV8hISHQ6XT4448/zBIzUUXGMxhEVcjChQsRFBSE+/fvY+nSpfjjjz9gY2NTpE/hB3xhomHM40mIs7PzU7d5mkf34erqWub9lKRu3brF2tzd3dGjRw+sWbMGn3zyCYCHZy+srKwwZMgQQ7+LFy/i1KlTxRKUQrdv35Y8XqKKjgkGURXSpk0btGrVCgAwaNAgdOzYESNHjkRiYiIcHR0BAA0aNAAAnDp1CoMGDTK6n1OnTgEAGjZsCACoX78+AOD06dMlbvM0j+6jU6dOT+2vUCggjNxlr9PpjPa3s7Mz2h4eHo5x48YhISEBwcHBWLNmDXr06AF3d3dDH71ej549e+K9994zuo+goKCnxktU1fASCVEVpVKpMHfuXKSkpOC7774ztHfs2BGurq5YvXp1iR/WK1asAABD7UbHjh1RrVo1REVFlbjN0/Tv3x8A8Msvv5Sqf7Vq1XDv3r1i7VevXjXpuIMGDYJarUZMTAwSEhJw4cIFhIeHF+kTEBCA7OxshISEGP2qXbu2ScckqgqYYBBVYV27dkWbNm2wYMEC5OXlAQDs7e0xZcoUJCYm4sMPPyy2zZYtW7Bs2TKEhobi+eefN2zz/vvv49y5c3j//feNnln45ZdfcOTIkRJjadeuHXr37o0lS5Zgw4YNxZ7XaDSYMmWK4XFAQADOnz+PO3fuGNpOnjyJAwcOlPr1A4CrqytCQ0OxZs0aREdHQ61WFzsLM2LECBw6dAg7duwotv29e/dQUFBg0jGJqgLO5ElUBRTO5Hn06FHDJZJCcXFxGD58OL7//nu8/vrrAB5eZggLC8PatWvRuXNnDB06FHZ2dti/fz9++eUXNGjQALt37y4yk6der8fYsWOxcuVKtGjRwjCTZ1paGjZs2IAjR47g4MGDaNeuXYlx3rlzB7169cLJkyfRv39/9OjRAw4ODrh48SKio6ORmpqK/Px8AA/vOmncuDGaNWuGl19+Gbdv38aiRYtQs2ZNZGZmGm7BTU5ORt26dTFv3rwiCcqjVq1ahRdffBFOTk7o2rWr4ZbZQjk5OejUqRNOnTqFsWPHomXLlnjw4AFOnz6NuLg4JCcnF7mkQkTgTJ5EVUFJE20JIYROpxMBAQEiICCgyCRZOp1O/Pzzz6JDhw7C2dlZ2NraikaNGonZs2eL7OzsEo8VFxcnevXqJdzc3ISVlZXw8vISYWFhYs+ePaWKNScnR3z55ZeidevWwtHRUajVahEYGCjeeustcenSpSJ9f/nlF+Hv7y/UarUIDg4WO3bseOJEWyXJzMwUdnZ2AoD45ZdfjPbJysoS06ZNE/Xq1RNqtVq4u7uL9u3biy+//FJoNJpSvTaiqoRnMIiIiEhyrMEgIiIiyTHBICIiIskxwSAiIiLJMcEgIiIiyTHBICIiIskxwSAiIiLJVbm1SPR6PVJSUuDk5ASFQiF3OERERBWGEAJZWVnw9vaGUvnkcxRVLsFISUmBr6+v3GEQERFVWNevX0etWrWe2KfKJRiFy0tfv37dsDz0s9Jqtdi5cyd69eoFa2trSfZZ1XFMpccxlRbHU3ocU2mZYzwzMzPh6+tr+Cx9kiqXYBReFnF2dpY0wbC3t4ezszN/KCTCMZUex1RaHE/pcUylZc7xLE2JAYs8iYiISHJMMIiIiEhyTDCIiIhIckwwiIiISHJMMIiIiEhyTDCIiIhIckwwiIiISHJMMIiIiEhyTDCIiIhIckwwiIiISHKyJhh//PEH+vfvD29vbygUCmzYsOGp2+zZswctWrSAjY0N6tWrh2XLlpk9TiIiIjKNrAnGgwcP0KxZMyxcuLBU/a9cuYIXXngB3bp1Q0JCAv7v//4Pr7zyCnbs2GHmSImIiMgUsi521qdPH/Tp06fU/RctWoS6deviq6++AgA0aNAA+/fvx9dff43Q0FBzhUlEREQmqlCrqR46dAghISFF2kJDQ/F///d/JW6Tn5+P/Px8w+PMzEwAD1eZ02q1ksRVuB+p9kccU3PgmEqL4ym9yjqmcXEKzJ6tQnZ2+R1TqdTByysZly/3Qu3aShw+LO3nXWlUqAQjLS0NNWvWLNJWs2ZNZGZmIjc3F3Z2dsW2mTt3LmbPnl2sfefOnbC3t5c0vvj4eEn3RxxTc+CYSovjKb3KNqbvvdcdN244ldvx7O1zMGLEGtSpcxVRURG4ds0XW7fulGTfOTk5pe5boRKMspg2bRomT55seJyZmQlfX1/06tULzs7OkhxDq9UiPj4ePXv2hLW1tST7rOo4ptLjmEqL4ym9yjqmQjz8qFUqBby8zHusatVuIzQ0Gk5O96DR2MDRUQsHBzX69u0ryf4LrwKURoVKMDw9PXHr1q0ibbdu3YKzs7PRsxcAYGNjAxs
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 600x500 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# ROC Curve\n",
|
|||
|
|
"fpr, tpr, _ = roc_curve(y_test, y_pred_proba)\n",
|
|||
|
|
"roc_auc = auc(fpr, tpr)\n",
|
|||
|
|
"\n",
|
|||
|
|
"plt.figure(figsize=(6, 5))\n",
|
|||
|
|
"plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (AUC = {roc_auc:.4f})')\n",
|
|||
|
|
"plt.plot([0, 1], [0, 1], color='gray', linestyle='--')\n",
|
|||
|
|
"plt.xlabel('False Positive Rate')\n",
|
|||
|
|
"plt.ylabel('True Positive Rate')\n",
|
|||
|
|
"plt.title('ROC Curve')\n",
|
|||
|
|
"plt.legend(loc='lower right')\n",
|
|||
|
|
"plt.grid(True)\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the ROC Curve\n",
|
|||
|
|
"\n",
|
|||
|
|
"The **Receiver Operating Characteristic (ROC) curve** shows how well the model distinguishes between the positive and negative classes across all decision thresholds.\n",
|
|||
|
|
"\n",
|
|||
|
|
"A quick reminder of the definitions:\n",
|
|||
|
|
"* True Positive Rate (TPR) = Recall\n",
|
|||
|
|
"* False Positive Rate (FPR) = Proportion of negatives wrongly classified as positives\n",
|
|||
|
|
"\n",
|
|||
|
|
"What we display in this plot is:\n",
|
|||
|
|
"* The x-axis is False Positive Rate\n",
|
|||
|
|
"* The y-axis is True Positive Rate\n",
|
|||
|
|
"\n",
|
|||
|
|
"The curve shows how TPR and FPR change as the threshold varies\n",
|
|||
|
|
"\n",
|
|||
|
|
"It's important to note that:\n",
|
|||
|
|
"* A model with no skill will produce a diagonal line (AUC = 0.5)\n",
|
|||
|
|
"* A model with perfect discrimination will hug the top-left corner (AUC = 1.0)\n",
|
|||
|
|
"\n",
|
|||
|
|
"The Area Under the Curve (ROC AUC) gives a single performance score:\n",
|
|||
|
|
"* Closer to 1 means better at ranking positive cases higher than negative ones\n",
|
|||
|
|
"\n",
|
|||
|
|
"**Important!**\n",
|
|||
|
|
"\n",
|
|||
|
|
"While useful, the ROC curve can sometimes overestimate performance when the dataset is imbalanced, because it includes negatives (which dominate in our case, around 99%!). That’s why we also MUST check the Precision-Recall curve."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 36,
|
|||
|
|
"id": "6790d41d",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhgAAAHWCAYAAAA1jvBJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABYBUlEQVR4nO3deVxU5f4H8M/sgICgbIIoKhqWa7iEZmqyqGnZptddS7OU+zNp00zRFklTc8mlvLncezW3yixXRMm1xQUr9wXFDQSVRZaZYeb5/cFlcpxBAR8Y0c+7l6+cZ87ynS/gfDjnOWcUQggBIiIiIomUji6AiIiIHjwMGERERCQdAwYRERFJx4BBRERE0jFgEBERkXQMGERERCQdAwYRERFJx4BBRERE0jFgEBERkXQMGERV1JAhQxAUFFSmdRITE6FQKJCYmFghNVV1nTp1QqdOnSyPz507B4VCgaVLlzqsJqKqigGDqJSWLl0KhUJh+ePk5IRGjRohOjoaaWlpji7vvlf8Zl38R6lUokaNGujWrRv27dvn6PKkSEtLw9tvv42QkBC4uLigWrVqCA0Nxccff4zMzExHl0dUqdSOLoCoqvnwww9Rr149FBQUYPfu3ViwYAE2btyIv/76Cy4uLpVWx6JFi2A2m8u0zlNPPYX8/HxotdoKquru+vbti+7du8NkMuHkyZOYP38+OnfujN9//x1NmzZ1WF336vfff0f37t1x8+ZNDBgwAKGhoQCA/fv349NPP8XOnTuxdetWB1dJVHkYMIjKqFu3bmjVqhUAYNiwYahZsyZmzpyJH374AX379rW7Tm5uLqpVqya1Do1GU+Z1lEolnJycpNZRVo8//jgGDBhgedyhQwd069YNCxYswPz58x1YWfllZmbi+eefh0qlwqFDhxASEmL1/CeffIJFixZJ2VdFfC8RVQSeIiG6R08//TQAIDk5GUDR3AhXV1ecOXMG3bt3h5ubG/r37w8AMJvNmDVrFh577DE4OTnB19cXI0aMwI0bN2y2u2nTJnTs2BFubm5wd3dH69atsWLFCsvz9uZgrFy5EqGhoZZ1mjZtitmzZ1ueL2kOxpo1axAaGgpnZ2d4eXlhwIABuHTpktUyxa/r0qVL6NWrF1xdXeHt7Y23334bJpOp3P3r0KEDAODMmTNW45mZmXjzzTcRGBgInU6H4OBgTJ061eaojdlsxuzZs9G0aVM4OTnB29sbXbt2xf79+y3LLFmyBE8//TR8fHyg0+nw6KOPYsGCBeWu+XZffvklLl26hJkzZ9qECwDw9fXFBx98YHmsUCgwadIkm+WCgoIwZMgQy+Pi03I///wzRo4cCR8fH9SuXRtr1661jNurRaFQ4K+//rKMHT9+HC+99BJq1KgBJycntGrVCuvXr7+3F010FzyCQXSPit8Ya9asaRkrLCxEVFQUnnzySUyfPt1y6mTEiBFYunQphg4div/7v/9DcnIyvvjiCxw6dAh79uyxHJVYunQpXnnlFTz22GMYN24cPDw8cOjQIWzevBn9+vWzW0d8fDz69u2LLl26YOrUqQCAY8eOYc+ePRg9enSJ9RfX07p1a8TFxSEtLQ2zZ8/Gnj17cOjQIXh4eFiWNZlMiIqKQtu2bTF9+nRs27YNM2bMQIMGDfDGG2+Uq3/nzp0DAHh6elrG8vLy0LFjR1y6dAkjRoxAnTp1sHfvXowbNw5XrlzBrFmzLMu++uqrWLp0Kbp164Zhw4ahsLAQu3btwi+//GI50rRgwQI89thjePbZZ6FWq/Hjjz9i5MiRMJvNGDVqVLnqvtX69evh7OyMl1566Z63Zc/IkSPh7e2NiRMnIjc3F8888wxcXV2xevVqdOzY0WrZVatW4bHHHkOTJk0AAEeOHEH79u0REBCAsWPHolq1ali9ejV69eqFb7/9Fs8//3yF1EwEQUSlsmTJEgFAbNu2TaSnp4sLFy6IlStXipo1awpnZ2dx8eJFIYQQgwcPFgDE2LFjrdbftWuXACCWL19uNb5582ar8czMTOHm5ibatm0r8vPzrZY1m82Wvw8ePFjUrVvX8nj06NHC3d1dFBYWlvgaduzYIQCIHTt2CCGEMBgMwsfHRzRp0sRqXz/99JMAICZOnGi1PwDiww8/tNpmy5YtRWhoaIn7LJacnCwAiMmTJ4v09HSRmpoqdu3aJVq3bi0AiDVr1liW/eijj0S1atXEyZMnrbYxduxYoVKpREpKihBCiO3btwsA4v/+7/9s9ndrr/Ly8myej4qKEvXr17ca69ixo+jYsaNNzUuWLLnja/P09BTNmze/4zK3AiBiY2NtxuvWrSsGDx5seVz8Pffkk0/afF379u0rfHx8rMavXLkilEql1deoS5cuomnTpqKgoMAyZjabRbt27UTDhg1LXTNRWfEUCVEZhYeHw9vbG4GBgfjHP/4BV1dXfP/99wgICLBa7vbf6NesWYPq1asjIiICGRkZlj+hoaFwdXXFjh07ABQdicjJycHYsWNt5ksoFIoS6/Lw8EBubi7i4+NL/Vr279+Pq1evYuTIkVb7euaZZxASEoINGzbYrPP6669bPe7QoQPOnj1b6n3GxsbC29sbfn5+6NChA44dO4YZM2ZY/fa/Zs0adOjQAZ6enla9Cg8Ph8lkws6dOwEA3377LRQKBWJjY232c2uvnJ2dLX/PyspCRkYGOnbsiLNnzyIrK6vUtZckOzsbbm5u97ydkgwfPhwqlcpqrE+fPrh69arV6a61a9fCbDajT58+AIDr169j+/bt6N27N3Jycix9vHbtGqKionDq1CmbU2FEsvAUCVEZzZs3D40aNYJarYavry8eeeQRKJXWWV2tVqN27dpWY6dOnUJWVhZ8fHzsbvfq1asA/j7lUnyIu7RGjhyJ1atXo1u3bggICEBkZCR69+6Nrl27lrjO+fPnAQCPPPKIzXMhISHYvXu31VjxHIdbeXp6Ws0hSU9Pt5qT4erqCldXV8vj1157DS+//DIKCgqwfft2zJkzx2YOx6lTp/DHH3/Y7KvYrb3y9/dHjRo1SnyNALBnzx7ExsZi3759yMvLs3ouKysL1atXv+P6d+Pu7o6cnJx72sad1KtXz2asa9euqF69OlatWoUuXboAKDo90qJFCzRq1AgAcPr0aQghMGHCBEyYMMHutq9evWoTjolkYMAgKqM2bdpYzu2XRKfT2YQOs9kMHx8fLF++3O46Jb2ZlpaPjw+SkpKwZcsWbNq0CZs2bcKSJUswaNAgLFu27J62Xez236Ltad26tSW4AEVHLG6d0NiwYUOEh4cDAHr06AGVSoWxY8eic+fOlr6azWZERETg3XfftbuP4jfQ0jhz5gy6dOmCkJAQzJw5E4GBgdBqtdi4cSM+//zzMl/qa09ISAiSkpJgMBju6RLgkibL3noEpphOp0OvXr3w/fffY/78+UhLS8OePXswZcoUyzLFr+3tt99GVFSU3W0HBweXu16iO2HAIKokDRo0wLZt29C+fXu7bxi3LgcAf/31V5n/8ddqtejZsyd69uwJs9mMkSNH4ssvv8SECRPsbqtu3boAgBMnTliuhil24sQJy/NlsXz5cuTn51se169f/47Ljx8/HosWLcIHH3yAzZs3Ayjqwc2bNy1BpCQNGjTAli1bcP369RKPYvz444/Q6/VYv3496tSpYxkvPiUlQ8+ePbFv3z58++23JV6qfCtPT0+bG28ZDAZcuXKlTPvt06cPli1bhoSEBBw7dgxCCMvpEeDv3ms0mrv2kkg2zsEgqiS9e/eGyWTCRx99ZPNcYWGh5Q0nMjISbm5uiIuLQ0FBgdVyQogSt3/t2jWrx0qlEs2aNQMA6PV6u+u0atUKPj4+WLhwodUymzZtwrFjx/DMM8+U6rXdqn379ggPD7f8uVvA8PDwwIgRI7BlyxYkJSUBKOrVvn37sGXLFpvlMzMzUVhYCAB48cUXIYTA5MmTbZYr7lXxUZdbe5eVlYUlS5aU+bWV5PXXX0etWrXw1ltv4eTJkzbPX716FR9//LHlcYMGDSzzSIp99dVXZb7cNzw8HDVq1MCqVauwatUqtGnTxup0io+
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 600x500 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# PR Curve\n",
|
|||
|
|
"precision, recall, _ = precision_recall_curve(y_test, y_pred_proba)\n",
|
|||
|
|
"pr_auc = average_precision_score(y_test, y_pred_proba)\n",
|
|||
|
|
"\n",
|
|||
|
|
"plt.figure(figsize=(6, 5))\n",
|
|||
|
|
"plt.plot(recall, precision, color='green', lw=2, label=f'PR curve (AUC = {pr_auc:.4f})')\n",
|
|||
|
|
"plt.xlabel('Recall')\n",
|
|||
|
|
"plt.ylabel('Precision')\n",
|
|||
|
|
"plt.title('Precision-Recall Curve')\n",
|
|||
|
|
"plt.legend(loc='lower left')\n",
|
|||
|
|
"plt.grid(True)\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the Precision-Recall (PR) Curve\n",
|
|||
|
|
"\n",
|
|||
|
|
"The **Precision-Recall (PR) curve** helps evaluate model performance, especially on imbalanced datasets like ours (where positive cases are rare).\n",
|
|||
|
|
"\n",
|
|||
|
|
"A quick reminder of the definitions:\n",
|
|||
|
|
"* Precision = How many of the predicted positives are actually positive\n",
|
|||
|
|
"* Recall = How many of the actual positives the model correctly identifies\n",
|
|||
|
|
"\n",
|
|||
|
|
"What we display in this plot is:\n",
|
|||
|
|
"* The x-axis is Recall \n",
|
|||
|
|
"* The y-axis is Precision \n",
|
|||
|
|
"\n",
|
|||
|
|
"The curve shows the trade-off between them at different model thresholds\n",
|
|||
|
|
"\n",
|
|||
|
|
"In imbalanced datasets, accuracy can be misleading — the PR curve focuses only on the positive class, making it much more meaningful:\n",
|
|||
|
|
"* A higher curve means better performance\n",
|
|||
|
|
"* The area under the curve (PR AUC) summarizes this: closer to 1 is better"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 37,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"\n",
|
|||
|
|
"# Compute confusion matrix: [ [TN, FP], [FN, TP] ]\n",
|
|||
|
|
"tn, fp, fn, tp = confusion_matrix(y_test, y_pred_opt).ravel()\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Total predictions\n",
|
|||
|
|
"total = tp + tn + fp + fn\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Compute all requested metrics\n",
|
|||
|
|
"recall = recall_score(y_test, y_pred_opt)\n",
|
|||
|
|
"precision = precision_score(y_test, y_pred_opt)\n",
|
|||
|
|
"f1 = fbeta_score(y_test, y_pred_opt, beta=1)\n",
|
|||
|
|
"f2 = fbeta_score(y_test, y_pred_opt, beta=2)\n",
|
|||
|
|
"f3 = fbeta_score(y_test, y_pred_opt, beta=3)\n",
|
|||
|
|
"fpr = fp / (fp + tn) if (fp + tn) != 0 else 0\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Scores relative to total\n",
|
|||
|
|
"tp_score = tp / total\n",
|
|||
|
|
"tn_score = tn / total\n",
|
|||
|
|
"fp_score = fp / total\n",
|
|||
|
|
"fn_score = fn / total\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Create DataFrame\n",
|
|||
|
|
"summary_df = pd.DataFrame([{\n",
|
|||
|
|
" \"flagging_analysis_type\": \"RISK_VS_CLAIM\",\n",
|
|||
|
|
" \"count_total\": total,\n",
|
|||
|
|
" \"count_true_positive\": tp,\n",
|
|||
|
|
" \"count_true_negative\": tn,\n",
|
|||
|
|
" \"count_false_positive\": fp,\n",
|
|||
|
|
" \"count_false_negative\": fn,\n",
|
|||
|
|
" \"true_positive_score\": tp_score,\n",
|
|||
|
|
" \"true_negative_score\": tn_score,\n",
|
|||
|
|
" \"false_positive_score\": fp_score,\n",
|
|||
|
|
" \"false_negative_score\": fn_score,\n",
|
|||
|
|
" \"recall_score\": recall,\n",
|
|||
|
|
" \"precision_score\": precision,\n",
|
|||
|
|
" \"false_positive_rate_score\": fpr,\n",
|
|||
|
|
" \"f1_score\": f1,\n",
|
|||
|
|
" \"f2_score\": f2,\n",
|
|||
|
|
" \"f3_score\": f3,\n",
|
|||
|
|
" \"roc_auc_score\": roc_auc,\n",
|
|||
|
|
" \"pr_auc_score\": pr_auc\n",
|
|||
|
|
"}])"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 38,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"def plot_confusion_matrix_from_df(df, flagging_analysis_type, name_of_the_experiment=\"\"):\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Subset - just retrieve one row depending on the flagging_analysis_type\n",
|
|||
|
|
" row = df[df['flagging_analysis_type'] == flagging_analysis_type].iloc[0]\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Define custom x-axis labels and wording\n",
|
|||
|
|
" if flagging_analysis_type == 'RISK_VS_CLAIM':\n",
|
|||
|
|
" x_labels = ['With Submitted Claim', 'Without Submitted Claim']\n",
|
|||
|
|
" outcome_label = \"submitted claim\"\n",
|
|||
|
|
" elif flagging_analysis_type == 'RISK_VS_SUBMITTED_PAYOUT':\n",
|
|||
|
|
" x_labels = ['With Submitted Payout', 'Without Submitted Payout']\n",
|
|||
|
|
" outcome_label = \"submitted payout\"\n",
|
|||
|
|
" else:\n",
|
|||
|
|
" x_labels = ['Actual Positive', 'Actual Negative'] \n",
|
|||
|
|
" outcome_label = \"outcome\"\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Confusion matrix structure\n",
|
|||
|
|
" cm = np.array([\n",
|
|||
|
|
" [row['count_true_positive'], row['count_false_positive']],\n",
|
|||
|
|
" [row['count_false_negative'], row['count_true_negative']]\n",
|
|||
|
|
" ])\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Create annotations for the confusion matrix\n",
|
|||
|
|
" labels = [['True Positives', 'False Positives'], ['False Negatives', 'True Negatives']]\n",
|
|||
|
|
" counts = [[f\"{v:,}\" for v in [row['count_true_positive'], row['count_false_positive']]],\n",
|
|||
|
|
" [f\"{v:,}\" for v in [row['count_false_negative'], row['count_true_negative']]]]\n",
|
|||
|
|
" percentages = [[f\"{round(100*v,2):,}\" for v in [row['true_positive_score'], row['false_positive_score']]],\n",
|
|||
|
|
" [f\"{round(100*v,2):,}\" for v in [row['false_negative_score'], row['true_negative_score']]]]\n",
|
|||
|
|
" annot = [[f\"{labels[i][j]}\\n{counts[i][j]} ({percentages[i][j]}%)\" for j in range(2)] for i in range(2)]\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Scores formatted as percentages\n",
|
|||
|
|
" recall = row['recall_score'] * 100\n",
|
|||
|
|
" precision = row['precision_score'] * 100\n",
|
|||
|
|
" f1 = row['f1_score'] * 100\n",
|
|||
|
|
" f2 = row['f2_score'] * 100\n",
|
|||
|
|
" f3 = row['f3_score'] * 100\n",
|
|||
|
|
" roc_auc = row['roc_auc_score'] * 100\n",
|
|||
|
|
" pr_auc = row['pr_auc_score'] * 100\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Set up figure and axes manually for precise control\n",
|
|||
|
|
" fig = plt.figure(figsize=(9, 8))\n",
|
|||
|
|
" grid = fig.add_gridspec(nrows=3, height_ratios=[1, 15, 2])\n",
|
|||
|
|
"\n",
|
|||
|
|
" \n",
|
|||
|
|
" ax_main_title = fig.add_subplot(grid[0])\n",
|
|||
|
|
" ax_main_title.axis('off')\n",
|
|||
|
|
" ax_main_title.set_title(f\"{name_of_the_experiment} - Flagged as Risk vs. {outcome_label.title()}\", fontsize=14, weight='bold')\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Heatmap\n",
|
|||
|
|
" ax_heatmap = fig.add_subplot(grid[1])\n",
|
|||
|
|
" ax_heatmap.set_title(f\"Confusion Matrix – Risk vs. {outcome_label.title()}\", fontsize=12, weight='bold', ha='center', va='center', wrap=False)\n",
|
|||
|
|
"\n",
|
|||
|
|
" cmap = sns.light_palette(\"#A73A52\", as_cmap=True)\n",
|
|||
|
|
"\n",
|
|||
|
|
" sns.heatmap(cm, annot=annot, fmt='', cmap=cmap, cbar=False,\n",
|
|||
|
|
" xticklabels=x_labels,\n",
|
|||
|
|
" yticklabels=['Flagged as Risk', 'Flagged as No Risk'],\n",
|
|||
|
|
" ax=ax_heatmap,\n",
|
|||
|
|
" linewidths=1.0,\n",
|
|||
|
|
" annot_kws={'fontsize': 10, 'linespacing': 1.2})\n",
|
|||
|
|
" ax_heatmap.set_xlabel(\"Resolution Outcome (Actual)\", fontsize=11, labelpad=10)\n",
|
|||
|
|
" ax_heatmap.set_ylabel(\"Flagging (Prediction)\", fontsize=11, labelpad=10)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Make borders visible\n",
|
|||
|
|
" for _, spine in ax_heatmap.spines.items():\n",
|
|||
|
|
" spine.set_visible(True)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Footer with metrics and date\n",
|
|||
|
|
" ax_footer = fig.add_subplot(grid[2])\n",
|
|||
|
|
" ax_footer.axis('off')\n",
|
|||
|
|
" metrics_text = f\"Total Booking Count: {row['count_total']} | Recall: {recall:.2f}% | Precision: {precision:.2f}% | F1 Score: {f1:.2f}% | F2 Score: {f2:.2f}% | ROC AUC: {roc_auc:.2f}% | PR AUC: {pr_auc:.2f}%\"\n",
|
|||
|
|
" date_text = f\"Generated on {date.today().strftime('%B %d, %Y')}\"\n",
|
|||
|
|
" ax_footer.text(0.5, 0.7, metrics_text, ha='center', fontsize=9)\n",
|
|||
|
|
" ax_footer.text(0.5, 0.1, date_text, ha='center', fontsize=8, color='gray')\n",
|
|||
|
|
"\n",
|
|||
|
|
" plt.tight_layout()\n",
|
|||
|
|
" plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 39,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA5wAAAMVCAYAAAAbDfvBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3QUVRvH8d+mB1JIAqGTAIHQe++9FwUBAZUigoIgTbFRBREQEEEElSYiVXoHBQEpgiBIb6H3UEIvybx/5M2YJZWEZYl8P+fsYffOnZlnZyfLPHvv3GsxDMMQAAAAAABPmYO9AwAAAAAA/DeRcAIAAAAAbIKEEwAAAABgEyScAAAAAACbIOEEAAAAANgECScAAAAAwCZIOAEAAAAANkHCCQAAAACwCRJOAAAAAIBNkHACSJYTJ07IYrGYj/Xr19s7pBRh6tSpVsftRbd+/Xqr43HixAl7h2RTAwYMMN9rYGBgkrfDeWQbT+vzSa6U+vk+67jbtm1r7qtKlSrJ3l5gYKC5vQEDBiR7e8CLjoQTL6yLFy/qs88+U+XKlZU+fXq5uLgoderUyp8/v958802tWLFChmHYJbbn5eKbZDJxol+cxveYOnWqvUPFU/T4RXXUw8nJSX5+fipTpowGDx6sGzdu2DvU/4xZs2apdu3aSp8+vZydneXt7a3s2bOrSpUqeu+997Rq1Sp7h/jMJDapS4nfQY8ePdKsWbPUvHlz5ciRQx4eHnJxcVGWLFlUv359jRs3TteuXbN3mAASycneAQD2MH78ePXq1Uv37t2zKn/48KH279+v/fv3a/LkyQoJCbHrr9sAUp7w8HBdvXpV27Zt07Zt2zRjxgz9+eef8vT0NOvUqlVLHh4ekiRvb297hZqivPHGG5o+fbpVWVhYmMLCwnTixAn9/vvvOnnypGrXrm2nCJ++kiVLasSIEfYO45nau3evWrRoof3798dYdvbsWZ09e1bLly/XlStXbNb6+Mknn5g/FJUrV84m+wBeJCSceOEMHz5cffr0MV87Ojqqfv36Kl68uCwWi44ePapVq1bp4sWLdowSKdnHH38sHx+fGOUlS5a0QzR4Vt5++23lzJlToaGhmjVrltkz4eDBg5oyZYq6detm1i1XrhwXsk9g5cqVVslm8eLFVbt2bXl4eOjy5cvauXOntmzZYscIbSN//vzKnz+/vcN4Zg4ePKjKlSvr6tWrZlmBAgVUp04d+fr66tKlS9q4caP++usvm8bx1ltv2XT7wAvHAF4g+/btMxwdHQ1JhiTD39/f2LlzZ4x6Dx48ML777jvj4sWLVuVnzpwxevfubRQoUMBInTq14erqagQEBBitW7c2tm3bFmM7/fv3N/cVEBBgXL9+3ejdu7eRLVs2w9nZ2ciePbsxZMgQIyIiwlwnqn5cjzZt2hiGYRgPHz40Pv30U6Nu3bpGjhw5DG9vb8PJycnw9fU1KlSoYHz99dfGgwcPYj0Op0+fNj744AOjSJEihqenp+Hq6mpkzZrVaNy4sbF69WrDMAwjICAg3jgqV65sGIZhhISEWJWvW7cuxv4WL15sNGrUyMiQIYPh7OxspEmTxqhatarx008/Wb33KBs2bDBeeuklI1OmTIazs7OROnVqIyAgwKhTp47Rv39/4/r162bdW7duGQMHDjSKFi1qeHh4GE5OTka6dOmMwoULGx06dDBWrFgR6zF4mqJ/zpKMkJCQBNeZMmWK1TrRrVu3zmjfvr1RtGhRI0OGDIaLi4vh7u5u5MyZ02jbtq2xZ8+eWLd54sQJo2XLloavr6+ROnVqo2LFisavv/4a774MwzD27NljNGjQwPD09DQ8PT2NOnXqGLt27Ypx/j7uxo0bxueff26UKlXK8PLyMpydnY2sWbMabdq0Mfbu3RtrjFeuXDE6depk+Pv7G25ubkbx4sWNWbNmGevWrXviY2gYhjF//nzjtddeMwoWLGj4+/ub50vevHmNLl26xLqdy5cvG7169TLy5ctnpEqVynB2djbSp09vlCxZ0ujSpYuxZcuWRO378eMa/dw/cOCA1bJOnTpZrRvfsT1x4oTRsWNHIygoyHBzczNcXV2NTJkyGeXKlTN69Ohh7N+/P84YonvvvffMcgcHB2PSpElxvpfw8HAjW7ZsZv3+/fvHqPPBBx+Yy3PlymWW79mzx2jdurUREBBguLi4GG5ubkbWrFmNqlWrGh9++KFx5syZRBzN+PXo0cPcd1BQkPHo0aMYdW7cuGFs2rTJqiy+4xzfd9fj64WFhRk9e/Y0smTJYri6uhp58+Y1xo4dG+P7q02bNlbfkYcOHTJeeuklw8vLy/Dx8TFatmxpXLhwwTAMw1i7dq1RoUIFw93d3UibNq3Rvn174+rVq1bbi+3zfTzu2B79+/c3KleuHG+dx4/HhQsXjI8++sgoXLiw4eHhYbi6uho5c+Y0OnfubJw8eTLWz+XEiRPGq6++avj4+BipUqUyKlasaKxZsybB75y4lC1b1mq9zz//PNb/I3bs2GEsWrQozuMe3aRJk4xmzZoZefLkMfz8/AwnJyfD09PTKFy4sPHBBx8Yly9fjrH96P//Rf9bePx76uDBg0a/fv2MbNmyGe7u7kbJkiXN/3MuXbpktG/f3kibNq3h5uZmlC9f3tiwYUOijwXwX0LCiRfK22+/bfWfxS+//JLodX///XfDx8cnzv+8HRwcjJEjR1qtE/2ixc/Pz8ibN2+s6/bt29dcJ7EJ582bNxOsW6NGjRgXZsuWLTM8PT3jXOe9994zDOPpJJzh4eHG66+/Hu92mjVrZhXj2rVrrX4UiO1x4MABs36VKlXirduiRYtEf8ZJ9bQTzl69esX7nlxcXIw1a9ZYrRMSEmJkyJAh1vOyfv36ce5r+/bthoeHR4z13NzcjJo1a8Z5cXr48GEjMDAwzhhdXV2NOXPmWK1z7do1I0+ePLHWfzzGxCacTZs2jfdYeXl5WSXod+/eNYKDg+Ndp0+fPonad3wJZ1hYmNWyTz75xGrduBKhixcvGunSpYs3vm+//TbOGKK8//77Zpmjo6MxY8aMBN9P3759zXVy585ttSwiIsIqIf38888Nw4j8ES9VqlTxxvs0fvTp2rWrub20adMaR48eTdR6TyPhTJ8+vVGiRIlY31vXrl2tthk98cmePXus/2cEBwcbP/74o+Hg4BBjWaVKlay296wSzs2bNxtp06aNs663t3eMZCmu7xyLxWLUq1cvzu+cuGzdutVqnYYNGyZqvceP++MJZ/HixeM9DpkzZzbOnj1rtU5iE87Ytu3g4GDMmjXLyJ49e4xlrq6uVj8YAS8KutTihfLrr7+az318fPTSSy8lar3r16+rSZMm5iAF7u7uateunby8vDRz5kydPHlSERER6t27t4oXL67KlSvH2EZoaKiuXbumN954Q5kyZdIPP/ygK1euSJLGjBmjTz/9VC4uLhoxYoSOHTumCRMmmOtG76JZoEABSZEDQeTIkUNlypRR5syZ5ePjo4cPH+rgwYOaO3euHj16pLVr1+qXX35R8+bNJUknT55Us2bNdOfOHXMbjRo1UpEiRXT58mX99ttv5j4/+eQTnThxQp9//rlZFtVlUJKyZs2a4HEbPny42Q3OYrGoadOmKly4sEJCQjR9+nQ9fPhQc+fOVZEiRfTxxx9Lkr777juFh4dLkvLkyaNmzZrJyclJp06d0t9//62dO3ea2z9w4IA5kJGDg4PeeOMN5c6dW1euXFFISIjdBjn6/vvvY+1S27t370Stnzp1alWuXFkFCxaUr6+v3N3dFRoaqmXLlunAgQN68OCBunXrZnWP07vvvqsLFy6Yr+vVq6fixYtr2bJlWrZsWaz7MQxD7du3161bt8yyli1bKkeOHJozZ47WrFkT63rh4eF6+eWXzS6j6dKlU6tWreTr66tVq1Zp8+bNun//vt544w0VL15cOXLkkCR9+umnOnjwoLmdypUrq3Llyvrjjz/ijDEhadKkUa1atZQ3b175+PjIxcVFFy9e1IIFC3Tq1CmFhYWpT58+Wr58uSRp3bp1OnTokCTJzc1Nb775pjJnzqw
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 900x800 with 3 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Plot confusion matrix for claim scenario\n",
|
|||
|
|
"plot_confusion_matrix_from_df(summary_df, 'RISK_VS_CLAIM', 'Contactless')"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Feature Importance\n",
|
|||
|
|
"Understanding what drives the prediction is useful for future experiments and business knowledge. Here we track both the native feature importances of the trees, as well as a more heavy SHAP values analysis.\n",
|
|||
|
|
"\n",
|
|||
|
|
"Important! Be aware that SHAP analysis might take quite a bit of time."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 40,
|
|||
|
|
"id": "d66ffe2c",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxkAAAMWCAYAAACdtUsqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeVyN6f/48deJ9lVISSQqZcqeJRQyWccyI4yR7MYYDLL8ZoxkN7IvYzBlN8Y2liwxasiWpcaSkCUzE2bsMUPq/P7o2/1xFJ1yyPJ+Ph7n8ejcy3W/r6v71P2+r+u6j0qtVqsRQgghhBBCCB3RK+wAhBBCCCGEEO8WSTKEEEIIIYQQOiVJhhBCCCGEEEKnJMkQQgghhBBC6JQkGUIIIYQQQgidkiRDCCGEEEIIoVOSZAghhBBCCCF0SpIMIYQQQgghhE5JkiGEEEIIIYTQKUkyhBBCCCGEEDolSYYQQgghAIiIiEClUuX6Gjly5Cs55oEDBwgJCeHOnTuvpPyXkd0eR48eLexQCmz+/PlEREQUdhjiPVS0sAMQQgghxJslNDSU8uXLayz74IMPXsmxDhw4wNixYwkKCsLKyuqVHON9Nn/+fEqUKEFQUFBhhyLeM5JkCCGEEEJD8+bNqVmzZmGH8VIePHiAqalpYYdRaB4+fIiJiUlhhyHeYzJcSgghhBD5sn37dho0aICpqSnm5ua0bNmS06dPa2zz+++/ExQUhJOTE0ZGRtja2tKjRw9u3rypbBMSEkJwcDAA5cuXV4ZmXb58mcuXL6NSqXId6qNSqQgJCdEoR6VScebMGT799FOKFStG/fr1lfUrVqygRo0aGBsbY21tTadOnbh69WqB6h4UFISZmRkpKSm0atUKMzMz7O3tmTdvHgAnT56kcePGmJqaUq5cOVatWqWxf/YQrN9++42+fftSvHhxLCwsCAwM5Pbt2zmON3/+fCpXroyhoSGlS5fmiy++yDG0zNfXlw8++IBjx47RsGFDTExM+H//7//h6OjI6dOniYmJUdrW19cXgFu3bjFs2DA8PDwwMzPDwsKC5s2bk5CQoFF2dHQ0KpWKtWvXMmHCBMqUKYORkRFNmjThwoULOeI9fPgwLVq0oFixYpiamuLp6cmsWbM0tjl79iyffPIJ1tbWGBkZUbNmTTZv3qyxTXp6OmPHjsXZ2RkjIyOKFy9O/fr1iYqK0ur3JAqf9GQIIYQQQsPdu3f5559/NJaVKFECgOXLl9OtWzf8/f2ZMmUKDx8+ZMGCBdSvX58TJ07g6OgIQFRUFBcvXqR79+7Y2tpy+vRpfvjhB06fPs2hQ4dQqVS0b9+ec+fOsXr1ambMmKEco2TJkvz999/5jrtDhw44OzszceJE1Go1ABMmTGD06NEEBATQq1cv/v77b+bMmUPDhg05ceJEgYZoZWRk0Lx5cxo2bMjUqVNZuXIlAwYMwNTUlK+//pouXbrQvn17vv/+ewIDA6lbt26O4WcDBgzAysqKkJAQkpKSWLBgAVeuXFEu6iEreRo7dix+fn58/vnnynZxcXHExsair6+vlHfz5k2aN29Op06d+OyzzyhVqhS+vr58+eWXmJmZ8fXXXwNQqlQpAC5evMimTZvo0KED5cuX5/r16yxcuBAfHx/OnDlD6dKlNeKdPHkyenp6DBs2jLt37zJ16lS6dOnC4cOHlW2ioqJo1aoVdnZ2DBo0CFtbWxITE9m6dSuDBg0C4PTp03h7e2Nvb8/IkSMxNTVl7dq1tG3blvXr19OuXTul7pMmTaJXr154eXlx7949jh49yvHjx2natGm+f2eiEKiFEEIIIdRqdXh4uBrI9aVWq9X3799XW1lZqXv37q2x37Vr19SWlpYayx8+fJij/NWrV6sB9W+//aYs++6779SA+tKlSxrbXrp0SQ2ow8PDc5QDqMeMGaO8HzNmjBpQd+7cWWO7y5cvq4sUKaKeMGGCxvKTJ0+qixYtmmP589ojLi5OWdatWzc1oJ44caKy7Pbt22pjY2O1SqVSr1mzRll+9uzZHLFml1mjRg3148ePleVTp05VA+pffvlFrVar1Tdu3FAbGBioP/zwQ3VGRoay3dy5c9WA+scff1SW+fj4qAH1999/n6MOlStXVvv4+ORY/t9//2mUq1ZntbmhoaE6NDRUWbZ37141oHZzc1M/evRIWT5r1iw1oD558qRarVarnzx5oi5fvry6XLly6tu3b2uUm5mZqfzcpEkTtYeHh/q///7TWF+vXj21s7OzsqxKlSrqli1b5ohbvD1kuJQQQgghNMybN4+oqCiNF2Tdqb5z5w6dO3fmn3/+UV5FihShdu3a7N27VynD2NhY+fm///7jn3/+oU6dOgAcP378lcTdr18/jfcbNmwgMzOTgIAAjXhtbW1xdnbWiDe/evXqpfxsZWWFq6srpqamBAQEKMtdXV2xsrLi4sWLOfbv06ePRk/E559/TtGiRYmMjARg9+7dPH78mMGDB6On97/Ltd69e2NhYcG2bds0yjM0NKR79+5ax29oaKiUm5GRwc2bNzEzM8PV1TXX30/37t0xMDBQ3jdo0ABAqduJEye4dOkSgwcPztE7lN0zc+vWLX799VcCAgK4f/++8vu4efMm/v7+nD9/nj///BPIatPTp09z/vx5resk3iwyXEoIIYQQGry8vHKd+J19wde4ceNc97OwsFB+vnXrFmPHjmXNmjXcuHFDY7u7d+/qMNr/eXZI0vnz51Gr1Tg7O+e6/dMX+flhZGREyZIlNZZZWlpSpkwZ5YL66eW5zbV4NiYzMzPs7Oy4fPkyAFeuXAGyEpWnGRgY4OTkpKzPZm9vr5EE5CUzM5NZs2Yxf/58Ll26REZGhrKuePHiObYvW7asxvtixYoBKHVLTk4GXvwUsgsXLqBWqxk9ejSjR4/OdZsbN25gb29PaGgobdq0wcXFhQ8++IBmzZrRtWtXPD09ta6jKFySZAghhBBCK5mZmUDWvAxbW9sc64sW/d9lRUBAAAcOHCA4OJiqVatiZmZGZmYmzZo1U8p5kWcv1rM9fTH8rKd7T7LjValUbN++nSJFiuTY3szMLM84cpNbWS9arv6/+SGv0rN1z8vEiRMZPXo0PXr0YNy4cVhbW6Onp8fgwYNz/f3oom7Z5Q4bNgx/f/9ct6lYsSIADRs2JDk5mV9++YVdu3axePFiZsyYwffff6/RiyTeXJJkCCGEEEIrFSpUAMDGxgY/P7/nbnf79m327NnD2LFj+fbbb5XluQ19eV4ykX2n/NknKT17Bz+veNVqNeXLl8fFxUXr/V6H8+fP06hRI+V9WloaqamptGjRAoBy5coBkJSUhJOTk7Ld48ePuXTp0gvb/2nPa99169bRqFEjlixZorH8zp07ygT8/Mg+N06dOvXc2LLroa+vr1X81tbWdO/ene7du5OWlkbDhg0JCQmRJOMtIXMyhBBCCKEVf39/LCwsmDhxIunp6TnWZz8RKvuu97N3uWfOnJljn+zvsng2mbCwsKBEiRL89ttvGsvnz5+vdbzt27enSJEijB07NkcsarVa43G6r9sPP/yg0YYLFizgyZMnNG/eHAA/Pz8MDAyYPXu2RuxLlizh7t27tGzZUqvjmJqa5vpt6kWKFMnRJj///LMyJyK/qlevTvny5Zk5c2aO42Ufx8bGBl9fXxYuXEhqamqOMp5+otizvxszMzMqVqzIo0ePChSfeP2kJ0MIIYQQWrGwsGDBggV07dqV6tWr06lTJ0qWLElKSgrbtm3D29ubuXPnYmFhoTzeNT09HXt7e3bt2sWlS5dylFmjRg0Avv76azp16oS+vj6tW7fG1NSUXr16MXnyZHr16kXNmjX57bffOHfunNbxVqhQgfHjxzNq1CguX75M27ZtMTc359KlS2zcuJE+ffowbNgwnbVPfjx+/JgmTZoQEBBAUlIS8+fPp379+nz00UdA1mN8R40axdixY2nWrBkfffSRsl2tWrX47LPPtDpOjRo1WLBgAePHj6dixYrY2NjQuHFjWrVqRWhoKN27d6devXqcPHmSlStXavSa5Ieenh4LFiygdevWVK1ale7
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 800x800 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"## BUILT-IN\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Get feature importances from the model\n",
|
|||
|
|
"importances = best_pipeline.named_steps['model'].feature_importances_\n",
|
|||
|
|
"features = X.columns\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Create a Series and sort\n",
|
|||
|
|
"feat_series = pd.Series(importances, index=features).sort_values(ascending=True) # ascending=True for horizontal plot\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Plot Feature Importances\n",
|
|||
|
|
"plt.figure(figsize=(8, 8))\n",
|
|||
|
|
"feat_series.plot(kind='barh', color='skyblue')\n",
|
|||
|
|
"plt.title('Feature Importances')\n",
|
|||
|
|
"plt.xlabel('Importance')\n",
|
|||
|
|
"plt.grid(axis='x')\n",
|
|||
|
|
"plt.tight_layout()\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the Feature Importance Plot\n",
|
|||
|
|
"The **feature importance plot** shows how much each feature contributes to the model’s overall decision-making.\n",
|
|||
|
|
"\n",
|
|||
|
|
"For tree-based models like Random Forest, importance is based on how often and how effectively a feature is used to split the data across all trees.\n",
|
|||
|
|
"A higher score means the feature plays a bigger role in improving prediction accuracy.\n",
|
|||
|
|
"\n",
|
|||
|
|
"In the graph you will see that:\n",
|
|||
|
|
"* Features are ranked from most to least important.\n",
|
|||
|
|
"* The values are relative and model-specific — not directly interpretable as weights or probabilities.\n",
|
|||
|
|
"\n",
|
|||
|
|
"This helps us identify which features the model relies on most when making predictions.\n",
|
|||
|
|
"\n",
|
|||
|
|
"**Important!**\n",
|
|||
|
|
"Unlike SHAP values, native importance doesn't show how a feature affects predictions — only how useful it is to the model overall. For deeper interpretability (e.g., direction and context), SHAP is better (but it takes more time to run)."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 41,
|
|||
|
|
"id": "e2197cea",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stderr",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"PermutationExplainer explainer: 6417it [45:34, 2.34it/s] \n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"## SHAP VALUES\n",
|
|||
|
|
"\n",
|
|||
|
|
"# SHAP requires that all features passed to Explainer be numeric (floats/ints)\n",
|
|||
|
|
"X_test_shap = X_test.copy()\n",
|
|||
|
|
"X_test_shap = X_test_shap.astype(float)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Function that returns the probability of the positive class\n",
|
|||
|
|
"def model_predict(data):\n",
|
|||
|
|
" return best_pipeline.predict_proba(data)[:, 1]\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Ensure input to SHAP is numeric\n",
|
|||
|
|
"X_test_shap = X_test.astype(float)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Create SHAP explainer\n",
|
|||
|
|
"explainer = shap.Explainer(model_predict, X_test_shap)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Compute SHAP values\n",
|
|||
|
|
"shap_values = explainer(X_test_shap)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 42,
|
|||
|
|
"id": "9cae1a51",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stderr",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"/tmp/ipykernel_881/3711913411.py:2: FutureWarning: The NumPy global RNG was seeded by calling `np.random.seed`. In a future version this function will no longer use the global RNG. Pass `rng` explicitly to opt-in to the new behaviour and silence this warning.\n",
|
|||
|
|
" shap.summary_plot(shap_values.values, X_test_shap)\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAzoAAAOsCAYAAACCjsPqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3gU1dfA8e9sS28kEEJCr1L8AQbpRbr0LoKCoAjSbKDYQfFVFBGpIiIdCdXQFdHQlC4WpPcEAiQhvW2Z949ll2w2gSQkNM/nefLATu7M3Jmd3dwz99w7iqqqKkIIIYQQQgjxENHc6woIIYQQQgghRGGTQEcIIYQQQgjx0JFARwghhBBCCPHQkUBHCCGEEEII8dCRQEcIIYQQQgjx0JFARwghhBBCCPHQkUBHCCGEEEII8dCRQEcIIYQQQgjx0JFARwghhBBCCPHQkUBHCCGEEEKIh9z48ePx9PS87e/OnTuHoiisWrUqX9sv6HpFSXevKyCEEEIIIYS4PwQFBfH7779TpUqVe12VOyaBjhBCCCGEEAIAFxcXGjRocK+rUSgkdU0IIYQQQggB5JyClpmZyejRoylWrBi+vr4MHTqUZcuWoSgK586dc1g/PT2dkSNH4ufnR1BQEGPGjMFkMt3lo7CSQEcIIYQQQoj/CJPJ5PRjsVhuuc64ceOYM2cOb775JmFhYVgsFsaNG5dj2XfeeQeNRsOKFSsYNmwYX3zxBd9++21RHMptSeqaEEIIIYQQ/wEpKSno9focf+fh4ZHj8ri4OGbPns27777Lm2++CUC7du1o3bo1Fy9edCpfv359pk2bBkCbNm349ddfWbVqFcOGDSuko8g7CXSEEEIIIQRGo5H58+cDMGjQoFwbxOIeUHrkvay6Jtdfubm5sWPHDqfl33zzDcuWLctxnb///pv09HS6dOnisLxr165s27bNqXzbtm0dXlevXp1ffvklLzUvdBLoCCGEEEII8R+g0WgIDQ11Wr5hw4Zc17l8+TIAxYsXd1heokSJHMv7+vo6vDYYDKSnp+ezpoVDxugIIYQQQgghchQUFATAtWvXHJZfvXr1XlQnXyTQEUIIIYQQ4r6m5OOncNWsWRNXV1fCw8Mdlv/www+Fvq/CJqlrQgghhBBCiBz5+/vz0ksv8fHHH+Pq6krt2rVZuXIlJ06cAKzpcPer+7dmQgghhBBCiHvu008/5cUXX+STTz6hd+/eGI1G+/TSPj4+97h2uVNUVVXvdSWEEEIIIcS9JbOu3ceUnnkvq64uunpk8eyzz7Jr1y7Onj17V/ZXEJK6JoQQQgghxH2t8Mfe5Mf27dvZvXs3jz32GBaLhQ0bNrB06VKmTJlyT+t1OxLoCCGEEEIIIXLl6enJhg0bmDRpEmlpaZQvX54pU6bwyiuv3Ouq3ZIEOkIIIYQQQohcPfbYY/z222/3uhr5JoGOEEIIIYQQ97V7m7r2oJJZ14QQQgghhBAPHQl0hBBCCCGEEA8dCXSEEEIIIYQQDx0ZoyOEEEIIIcR9TcboFIT06AghhBBCCCEeOhLoCCGEEEIIIR46EugIIYQQQgghHjoS6AghhBBCCCEeOhLoCCGEEEIIIR46MuuaEEIIIYQQ9zWZda0gpEdHCCGEEEII8dCRQEcIIYQQQgjx0JFARwghhBBCFJ7YJEjPdFz293nY/o/zciGKkIzREUIIIYQQd+5MNPzvNUhOv3W54e1h5ot3p04PDRmjUxDSoyOEEEIIIe5MQgpUHH77IAdg1hb48Y+ir5P4z5NARwghhBDiIWCyqNRfbEI32YT3Vya+/9d893beYWL+yr/8bdHUQ4gsJHVNCCGEEOIh4DHVTKbF+v8kI/TbpFLK00zzMtqi3fGPf8Bvx/O3zvHLcC0BivsUTZ0eOpK6VhDSoyOEEEII8YDbdNJkD3Ky6hmuFu2Odx2F9h8VbN2G4wq3LkJkI4GOEEIIIcQDyKKqTD1gImC6iY7hOZeJzSiCHadnwtV4mLAcmr5T8O2cvgKvfVdo1RIiO0ldE0IIIYR4AHVcbWbLuduXm3bQxOjHCqHJZzZDmRfh0vU735bNlxsgxB9e61p423woSepaQUiPjhBCCCHEAyY5w5KnIAfg5V8LaaflhhVukGPz+kJ4dV7hb1f850mgI4QQQgjxgHhuk3VWNa/pOQzIuYVz8fkrn6PI2DvfRm6mboSd/xbd9sV/kqSuCSGEEELchz7fa+KD3yDTDOV94EwCFDRceXeHhSVd7uD+9vA5BV83r5q9C49Xgr2fFf2+xH+C9Og8JC5dukRoaChz5tyFL6K7YOXKlfTs2ZOGDRsSGhrKpUuX7nWV8uXAgQOEhoayfv16+7KifI/mzJlT6OcpNDSU8ePHF9r27rYHvf5C5Nf69esJDQ3lwIED97oqohAETjfxxk5IM4MZOHUHQQ7AxrN3WKHZP97hBvJo3yl4Ycbd2dcDRcnHj7CRHh1x3zlw4ACTJk2iefPmDBw4EJ1Oh5+f372ulngIzJkzh6pVq9KiRYt7XRVxj1y6dIn169fTokULqlateq+rI4STt3eY+GRf4W833ngHK2feycoFMO8XOBoFuz+5u/sVDx0JdMR9Z+/evQC8//77+Pg8PA8SCwoKYvfu3Wi1RfzgNgGQ47meO3cunTp1kkDnP+zSpUvMnTuXUqVKPXSBTocOHWjbti16vf5eV0XkweVklV8vWHDVqSw8AmcT4EjMnfXa3E7bFSZ+6lOApp93v8KvzO38dhzafAAb3gUXuaZFwUigIwrEZDJhNptxcXEp9G3HxMQAPFRBDoCiKEVyvkTO5FyLB1l6ejo6nQ6dLu9/prVardxIuU9dT1fZddHCoSsq607DH9egiB/jmaOtF8BkUdFp8pHeVGwAZJiLrlK38vPf4PoUtKgBvxbwoaQPDUlJKwgJdPJh/fr1TJgwgdmzZ3Ps2DFWrVrF1atXCQoKYvDgwXTq1Amw3jHs0qULQ4YMYejQoQ7bmDNnDnPnzmXdunWUKlUKgPHjx7NhwwZ+/vlnpk6dys6dOzEajdSrV4+33nqLgIAA1qxZw7Jly7h06RJBQUGMGjUq17vSW7ZsYcGCBVy4cAE/Pz+6dOnC888/7/QHMyYmhrlz57Jr1y5iY2Px9fWladOmvPTSSxQrVsypzmFhYYSHh/Pzzz8TExPDrFmzCA0NzfP5i4iIYNGiRZw4cQJFUahcuTIDBgywH4ftvNnYtl23bl2++eabPO3j2rVrLFmyhP3793P58mUyMjIIDg6mY8eOPPvssw6NANv7OXPmTA4fPsz69euJjY2lbNmyDBo0iHbt2jlsu3PnzgQFBfHaa68xdepUjhw5gl6vp2nTprz88ssO5ywnt7oufvrpJ8LCwjh58iRms5lKlSrx7LPP0rp1a4dyFouFhQsXsnbtWmJiYggJCWHQoEF5Oje5OX36NFOnTuWPP/7AYDDQqFEjXnvttVzL57WuoaGhdOrUiSeffJLZs2dz8uRJPD09adOmDcOHD8fd3d3p/MyePZu9e/eSlJREiRIlaNu2Lc8//zyurq72cgkJCXz77bfs2LGDa9eu4ebmRlBQEG3btmXAgAFO+x8/frzDtbVhwwY2bNhgL5ef8Qy2a2DMmDFMnTqVv//+G1dXVzp06MCoUaMwm83Mnj2bH3/8kYSEBGrUqMHbb79N+fLl7dtISUlh4cKF7N27l8jISFJTUwkMDKRVq1YMGTLE4VgPHDjAsGHD+OCDD1BVlSVLlnDx4kX8/f3p3bs3AwcOdKjfnj17CA8P599//yUmJga9Xk+NGjUYPHgwjz32mNPxbNu2jW+//Zbz58/j5+dH165d+d///seIESP44IMP6Ny5s71sZmYmS5YsYcuWLURGRmIwGKhTpw5Dhw6lWrVqOdY5PT2d77//nujoaEqXLs3IkSNp2rQpp06d4quvvuKvv/5Cp9PRvn17Xn31VafvqAsXLjB37lz27dtHQkICxYsXp3Xr1rz44ou4ubnZy9m+QyMiIpg+fTq//PILKSkpVKtWjddee42aNWsCNz/zABMmTLD/Pz/fMWC9hlasWMGFCxcwmUz4+/t
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 800x950 with 2 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Plot summary\n",
|
|||
|
|
"shap.summary_plot(shap_values.values, X_test_shap)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the SHAP Summary Plot\n",
|
|||
|
|
"\n",
|
|||
|
|
"Each point on a row represents a SHAP value for a single prediction (row = feature).\n",
|
|||
|
|
"The x-axis shows how much the feature contributed to increasing or decreasing the prediction.\n",
|
|||
|
|
"* Right (positive SHAP value): pushes prediction toward the positive class (i.e., higher chance of incident).\n",
|
|||
|
|
"* Left (negative SHAP value): pushes prediction toward the negative class (i.e., lower chance of incident).\n",
|
|||
|
|
"\n",
|
|||
|
|
"Color shows the actual feature value for that point:\n",
|
|||
|
|
"* Red = high value\n",
|
|||
|
|
"* Blue = low value\n",
|
|||
|
|
"\n",
|
|||
|
|
"In other words:\n",
|
|||
|
|
"* The position tells you impact.\n",
|
|||
|
|
"* The color tells you feature value.\n",
|
|||
|
|
"* The density (thickness) of dots shows how often a value occurs."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 43,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABLwAAAPZCAYAAAAbQTNdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd1hT1xsH8G+AsJcMARERFREXbhx14N5VsW5rVVptta3VOlpbW7dtHbXaqtVW6x6I+lOrKO4KdW8RJ8pQEUT25vz+SBOJCRCWUfL9PE+elnPPvefNzU2EN+e8VyKEECAiIiIiIiIiIion9LQdABERERERERERUWliwouIiIiIiIiIiMoVJryIiIiIiIiIiKhcYcKLiIiIiIiIiIjKFSa8iIiIiIiIiIioXGHCi4iIiIiIiIiIyhUmvIiIiIiIiIiIqFxhwouIiIiIiIiIiMoVJryIiIiIiIiIiKhcYcKLiIiIiIiIiIjKFSa8iIiIiIiIiIioXGHCi4iIiIiIiIhIA7m5uZgzZw6qV68OqVSK6tWr48cff0StWrWQm5tb5OOtXLkSVapUQUZGRhlEq9skQgih7SCIiIiIiIiIiN50y5cvx2effYZJkyahfv36sLKywsiRI7Fw4UKMHDmyyMdLT09H1apV8fXXX+Ozzz4rg4hV5eTk4Pz587hz5w4yMjJgY2ODpk2bonLlyhofIzY2FufPn8fTp0+RnZ0NS0tLeHp6om7dukUe5/nz57hw4QJiY2ORmpoKAwMDVKhQAV5eXnB1dS3282TCi4iIiIiIiIhIA40bN4adnR0CAwMBAD///DO+++47PH36FMbGxsU65tSpU7Ft2zY8ePAAEomkNMNV68iRI7h//z7q1asHKysr3L59GzExMejVqxccHR0L3T8yMhIHDx6EnZ0dqlWrBqlUisTERAgh0Lx58yKP8+jRI1y/fh0ODg4wNTVFdnY2Hjx4gCdPnqB169bw9PQs1vNkwouIiIiIiIiIqBDp6ekwNzfHzJkzMX36dACAl5cX6tevjw0bNhT7uBcuXECTJk1w5MgRtG/fvrTCVSsmJga7d++Gt7c3vLy8AADZ2dnw9/eHiYkJ3n333QL3z8zMxLZt2+Dg4IBOnTrlm6Ar6Ti5ubnYtWsXsrOzMXDgwGI8U9bwIiIiIiIiIiIq0OjRo2FiYoKcnBx88803kEgkcHJywtWrV9GxY0eV/lFRUTA2NsaoUaOU2oOCgiCVSvHFF18o2ho3bgwbGxvs2bOnzJ/H/fv3IZFIlGZNGRgYwMPDA0+fPkVycnKB+9+9exdpaWlo2rQpJBIJsrKyoG4eVUnH0dPTg5mZGTIzM4v4DF8yKPaeREREREREREQ6YOjQoZBKpVi1ahWWLl0KGxsb3Lt3D99//z0aNWqk0t/Z2Rl+fn74/fff8d1338HV1RW3bt3Ce++9h27dumHRokVK/Rs1aoTTp08XGENubq7GCSAjIyO1s6/i4uJgZWUFQ0NDpfaKFSsqtpubm+d73KioKEilUqSkpODQoUNISEiAgYEB3N3d0aJFCxgYGBR7nKysLOTk5CAzMxPh4eGIiIhA9erVNXq+6jDhRURERERERERUgPbt2+PIkSMwMzPD+PHjoaenh2+//RYA4Obmpnafr776CmvWrMEPP/yA2bNno2fPnqhatSq2bNkCPT3lBXfVqlUrdFnkkydPsG/fPo3iHTx4MCwsLFTaU1NTYWpqqtIub0tJSSnwuAkJCRBC4NChQ/Dw8ECzZs0QHR2NGzduIDMzEx06dCj2OP/++y9CQ0MBABKJBFWrVkWrVq0Keab5Y8KLiIiIiIiIiKgQV69eRZ06dRTJqri4OBgYGOQ7I8rZ2RkffvghVq9ejYsXLyItLQ0nTpyAmZmZSt8KFSogLS0t30QRANja2qJ79+4axWpiYqK2PTs7G/r6+irt8racnJwCj5uVlYXs7Gx4enoqklFubm7Izc1FaGgomjRpAisrq2KNU69ePbi5uSE1NRX379+HEKLQeArChBcRERERERERUSGuXLmCLl26FGmfL7/8EsuXL8fVq1dx6tQpODs7q+0nr4NV0F0ajYyMULly5SKN/yoDAwO1SSR5m7ok1av7A0CNGjWU2mvUqIHQ0FA8ffoUVlZWxRrH2toa1tbWAICaNWti//79CAwMRJ8+fYp190oWrSciIiIiIiIiKsCLFy8QERGBevXqKdpsbW2RnZ2NpKSkfPebO3cuANnMKhsbm3z7xcfHw9TUNN+ZWYAsWZSamqrRIzc3V+0xTE1NkZqaqtIub1M3++zV/QHVGWTynzMyMkplHEC2zPPZs2dISEgotK86nOFFRERERERERFSAq1evAgDq16+vaKtVqxYA4MGDB0rtcj/99BPWrFmD5cuXY/LkyZg7dy7WrFmj9vgPHjxQuqOhOk+fPi1xDS9bW1tER0cjMzNTqaB8TEyMYntB7O3tERUVhZSUFMVsLOBlTS554quk4wCyJCGAYt+pkQkvIiIiIiIiIqICXLlyBYBywqtFixYAgPPnz6skvHbv3o1p06Zh9uzZGDduHO7cuYPffvsN06dPV1vk/uLFixg6dGiBMZRGDa9q1arh6tWrCA0NhZeXFwDZzLGwsDBUrFhRUY8sOzsbycnJMDY2hrGxsdL+ly9fRlhYmNLyzFu3bkEikcDJyalI4wBAWlqaSry5ubm4c+cO9PX1UaFCBY2e86uY8CIiIiIiIiIiKsDVq1fh7OystCyxWrVqqFu3LoKCgjBq1ChF+4ULFzB06FAMHToU06dPBwBMmTIFK1euVDvL68KFC3j+/DnefffdAmMojRpeFStWRLVq1XD27FmkpaXBysoKt2/fRlJSEtq2bavoFxMTg3379qFRo0Zo0qSJot3Ozg4eHh4ICwtDbm4unJyc8PjxY9y/fx8NGjRQLFXUdBwAOHXqFDIzM+Hk5AQzMzOkpqbi7t27ePHiBZo3bw6pVFqs58qEFxERERERERFRAa5evap22eKoUaMwY8YMxSylyMhI9OrVCw0bNsTq1asV/SpVqoRRo0ZhzZo1KrO8duzYgSpVqqB9+/av5bm0a9cO5ubmuHPnDjIzM2FjY4OuXbsqZmcVpnXr1jA3N0dYWBjCw8Nhbm6OFi1aKNU3K8o41apVQ1hYGG7evIn09HQYGhrCzs4OzZo1Q9WqVYv9PCVCfisAIiIiIiIiIiLSWEJCAqpVq4Yff/wRo0ePLvL+GRkZqFq1KqZNm4bPP/+8DCLUXbxLIxERERERERFRMVhZWWHKlCn46aef8r0zYkHWrl0LqVSKsWPHlkF0uo0zvIiIiIiIiIiIqFzhDC8iIiIiIiIiIipXmPAiIiIiIiIiIqJyhQkvIiIiIiIiIiIqV5jwIiIiIiIiIiKicoUJLyIiIiIiIiIiKleY8CIiIiIiIiIiKiWJiYlo164dEhMTtR2KTmPCi4iIiIiIiIiolCQmJuLEiRNMeGkZE15ERERERERERFSuMOFFRERERERERETlChNeRERERERERERUrjDhRURERERERERUSiwtLdGyZUtYWlpqOxSdJhFCCG0HQURERERERERUXly+fBkNGjTQdhg6jTO8iIiIiIiIiIioXOEMLyIiIiIiIiKiUpSeng5jY2Nth6HTOMOLiIiIiIiIiKgURUVFaTsEnceEFxERERERERFRKUpKStJ2CDqPCS8iIiIiIiIiolJkZGSk7RB0Hmt4ERERERERERGVopycHOjr62s7DJ3GGV5ERERERERERKXo2rVr2g5B5zHhRURERERERERE5QoTXkREREREREREpahixYraDkHnMeFFRERERERERFSKjI2NtR2CzmPCi4iIiIiIiIioFD169EjbIeg8JryIiIiIiIiIiKhckQghhLaDICIiIiIiIiIqL1JTU2FqaqrtMHQaZ3gREREREREREZWip0+fajsEnceEFxERERERERFRKUpISNB2CDqPCS8iIiIiIiIiolIklUq1HYLOYw0vIiIiIiIiIiIqVzjDi4iIiIiIiIioFF2+fFnbIeg8JryIiIiIiIiIiKhcYcKLiIi
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 800x1150 with 3 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Show the individual prediction for the highest predicted instance\n",
|
|||
|
|
"highest_pred_index = np.argmax(shap_values.values[:, 0]) \n",
|
|||
|
|
"\n",
|
|||
|
|
"# Use waterfall plot for a single instance\n",
|
|||
|
|
"shap.plots.waterfall(shap_values[highest_pred_index], max_display=20)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 44,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNsAAAPZCAYAAAAoeixUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd1hT59sH8O8JhLBlykZwATLELe69Zyu4W7cdjlfraK3Wra22ttVOrauuqqh1r+KsWkFREReKExVERBmyc94/+CUSEzAgEMDv57q4NM95znnuc3ISyJ1nCKIoiiAiIiIiIiIiIqK3JtF1AERERERERERERBUFk21ERERERERERETFhMk2IiIiIiIiIiKiYsJkGxERERERERERUTFhso2IiIiIiIiIiKiYMNlGRERERERERERUTJhsIyIiIiIiIiIiKiZMthERERERERERERUTJtuIiIiIiIiIiIiKCZNtRERERERERERExYTJNiIiIiIiIiIiomLCZBsRERERFRu5XI558+ahWrVqkEqlqFatGhYtWgRPT0/I5fJCH++3336Dq6srMjIySiBaIiIiouIniKIo6joIIiIiIqoYfvrpJ4wbNw6fffYZ/Pz8UKlSJQwdOhTffvsthg4dWujjpaenw83NDdOmTcO4ceNKIGJ1OTk5OHfuHG7evImMjAxYWVmhQYMGcHZ2LrH9w8PDce7cOVhaWiIwMFBt+4sXLxAWFoa4uDikp6fD1NQU1atXR+3ataGvr1/kcyUiIqLix55tRERERFRsVq9ejfbt22Px4sUYPHgwbt++jezsbPTv379IxzM0NMSHH36IJUuWoLS+Iz527BgiIiJQvXp1NGnSBBKJBPv370dsbGyJ7J+SkoKLFy/mmzRLSUnBjh078OTJE3h7e6NJkyaws7PD+fPnERISUuTzJCIiopLBZBsRERERFYv09HRcunQJLVq0UJatXr0aPXr0gKGhYZGPGxQUhHv37uHo0aPFEWaBnjx5gujoaDRs2BCNGzeGl5cXunbtCjMzM5w9e7ZE9v/vv/9QuXJl2Nraatx+8+ZNZGZmolOnTvD394eXlxdatWqFGjVq4N69exxiS0REVMYw2UZEREREb2348OEwMjJCTk4Opk+fDkEQ4ODggIiICLRr106t/sOHD2FoaIhhw4aplP/zzz+QSqWYMGGCsqxevXqwsrLCzp07S/w8bt++DUEQ4OXlpSzT19eHh4cH4uLikJKSUqz7P378GHfu3EGTJk3yPWZmZiYAwNjYWKXc2NgYgiBAIuGf9ERERGUJfzMTERER0VsbOHAgRo8eDQD48ccfsW7dOnz00UcAgLp166rVd3JywogRI7B+/Xrcu3cPAHD9+nUEBgaic+fO+O6771Tq161bF6dOnSowBrlcjvT0dK1+8huSmpCQgEqVKsHAwEClvHLlysrtBSnM/nK5HKdOnYKnpyesrKzyPaajoyMA4Pjx43j69ClSUlIQHR2Nq1evwtvbG1KptMCYiIiIqHRxNlUiIiIiemtt2rRBSEgITExMMGbMGEgkEsyYMQMA4O7urnGfL774An/88Qe++eYbzJ07F926dYObmxs2bdqk1luratWqWLduXYExxMbGYs+ePVrF279/f5iZmamVv3z5Uq0HGfCqV1lqamqBxy3M/teuXUNKSgq6du1a4DFdXFxQv359XLhwQZmYBIA6deqgQYMGBe5LREREpY/JNiIiIiIqFhEREfD29lYmyhISEqCvrw9TU1ON9Z2cnDBy5EisWLEC4eHhSEtLw/Hjx2FiYqJW19LSEmlpafkmswDA2toaXbp00SpWIyMjjeXZ2dnQ09NTK1eU5eTkFHhcbfdPT0/HuXPnULdu3XxjycvMzAwODg5wd3eHoaEh7t+/jwsXLsDIyAg+Pj5v3J+IiIhKD5NtRERERFQsLl26hI4dOxZqn0mTJuGnn35CREQETp48CScnJ431FMM+BUHI91gymQzOzs6Fav91+vr6GhNqijJNibSi7B8WFgaZTAZvb+83xnTr1i2cOHECffv2VSYu3d3dIYoiQkNDUb169bdagIKIiIiKF5NtRERERPTWnj9/jgcPHsDX11dZZm1tjezsbCQnJ2scsgkA8+fPB5DbI6ygecsSExNhbGxcYC+wnJwcrVfmNDQ01LiwgLGxscahoi9fvgQAjb3uCrv/ixcvcP36dQQEBCjLFfHL5XIkJydDKpUqE2hXr16FjY2NWg/BKlWqICoqCk+fPn3rJCMREREVHybbiIiIiOitRUREAAD8/PyUZZ6engCAO3fuqJQrLF68GH/88Qd++uknTJ48GfPnz8cff/yh8fh37txRWeFTk7i4uLees83a2hqPHj1CZmamyiIHT548UW4viDb7JyUlQRRFnD59GqdPn1Y7xqZNm+Dj46NcoTQtLQ0ymUytnlwuB4B8F3sgIiIi3WCyjYiIiIje2qVLlwCoJtsCAgIAAOfOnVNLtv3999/4/PPPMXfuXHz66ae4efMmfvnlF3z55ZcaF1QIDw/HwIEDC4yhOOZsq1q1KiIiInDt2jXUrl0bQG6Psxs3bqBy5crK3mXZ2dlISUmBoaGhyhBObfbX19dHhw4d1NoOCwtDVlYWmjRpAnNzc2V5pUqVEBMTg+fPn8PCwkJZHh0dDUEQCuwRSERERKWPyTYiIiIiemsRERFwcnJSSfxUrVoVPj4++OeffzBs2DBl+fnz5zFw4EAMHDgQX375JQBgypQp+O233zT2bjt//jyePXuGnj17FhhDcczZVrlyZVStWhWhoaFIS0tDpUqVEBUVheTkZLRs2VJZ78mTJ9izZw/q1q2L+vXrF2p/Q0NDuLm5qbV9+fJlAFDbVrt2bTx48AC7d++Gt7c3ZDIZ7t+/jwcPHsDT0/ONQ1uJiIiodKlPVEFEREREVEgREREah4oOGzYMu3fvRlpaGgAgJiYG3bt3R506dbBixQplPUdHRwwbNgx//vkn7ty5o3KMrVu3wtXVFW3atCnZk/ifVq1awdfXFzdv3sTp06chl8vRqVMnODg4lMr+r3NwcEDPnj1hY2ODK1eu4MyZM0hKSkKDBg3QrFmzIh2TiIiISo4gcpIHIiIiIiohL168QNWqVbFo0SIMHz680PtnZGTAzc0Nn3/+OcaPH18CERIREREVL/ZsIyIiIqISU6lSJUyZMgWLFy9WTuhfGKtXr4ZUKsVHH31UAtERERERFT/2bCMiIiIiIiIiIiom7NlGRERERERERERUTJhsIyIiIiIiIiIiKiZMthERERERERERERUTJtuIiIiIiIiIiIiKCZNtRERERERERERExYTJNiIiIiIqVUlJSWjVqhWSkpJ0HQoRERFRsWOyjYiIiIhKVVJSEo4fP85kGxEREVVITLYREREREREREREVEybbiIiIiIiIiIiIigmTbURERERERERERMWEyTYiIiIiKlXm5uZo0qQJzM3NdR0KERERUbETRFEUdR0EEREREb1bLl68CH9/f12HQURERFTs2LONiIiIiIiIiIiomLBnGxERERGVuvT0dBgaGuo6DCIiIqJix55tRERERFTqHj58qOsQiIiIiEoEk21EREREVOqSk5N1HQIRERFRiWCyjYiIiIhKnUwm03UIRERERCWCc7YRERERUanLycmBnp6ersMgIiIiKnbs2UZEREREpe7y5cu6DoGIiIioRDDZRkREREREREREVEyYbCMiIiKiUle5cmVdh0BERERUIphsIyIiIqJSZ2hoqOsQiIiIiEoEk21EREREVOru37+v6xCIiIiISgSTbURERERERERERMVEEEVR1HUQRERERPRuefnyJYyNjXUdBhEREVGxY882IiIiIip1cXFxug6BiIiIqEQw2UZEREREpe7Fixe6DoGIiIioRDDZRkRERESlTiqV6joEIiIiohLBOduIiIiIiIiIiIiKCXu2EREREVGpu3jxoq5DICIiIioRTLYREREREREREREVEybbiIiIiKjU2djY6DoEIiIiohLBZBsRERERlTpTU1Ndh0BERERUIphsIyIiIqJSd/fuXV2HQER
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 800x1150 with 3 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Show the individual prediction for the lowest predicted instance\n",
|
|||
|
|
"lowest_pred_index = np.argmin(shap_values.values[:, 0]) \n",
|
|||
|
|
"\n",
|
|||
|
|
"# Use waterfall plot for a single instance\n",
|
|||
|
|
"shap.plots.waterfall(shap_values[lowest_pred_index], max_display=20)"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"metadata": {
|
|||
|
|
"kernelspec": {
|
|||
|
|
"display_name": "venv",
|
|||
|
|
"language": "python",
|
|||
|
|
"name": "python3"
|
|||
|
|
},
|
|||
|
|
"language_info": {
|
|||
|
|
"codemirror_mode": {
|
|||
|
|
"name": "ipython",
|
|||
|
|
"version": 3
|
|||
|
|
},
|
|||
|
|
"file_extension": ".py",
|
|||
|
|
"mimetype": "text/x-python",
|
|||
|
|
"name": "python",
|
|||
|
|
"nbconvert_exporter": "python",
|
|||
|
|
"pygments_lexer": "ipython3",
|
|||
|
|
"version": "3.12.3"
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
"nbformat": 4,
|
|||
|
|
"nbformat_minor": 5
|
|||
|
|
}
|