1439 lines
352 KiB
Text
1439 lines
352 KiB
Text
|
|
{
|
|||
|
|
"cells": [
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "84dcd475",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"# DDRA - 001 - Basic Booking Attributes\n",
|
|||
|
|
"\n",
|
|||
|
|
"## General Idea\n",
|
|||
|
|
"The idea is to start with a very simple model with basic Booking attributes. This should serve as a first understanding of what can bring value in the data-driven risk assessment of new dash protected bookings.\n",
|
|||
|
|
"\n",
|
|||
|
|
"## Initial setup\n",
|
|||
|
|
"This first section just ensures that the connection to DWH works correctly."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 1,
|
|||
|
|
"id": "12368ce1",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"🔌 Testing connection using credentials at: /home/uri/.superhog-dwh/credentials.yml\n",
|
|||
|
|
"✅ Connection successful.\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# This script connects to a Data Warehouse (DWH) using PostgreSQL. \n",
|
|||
|
|
"# This should be common for all Notebooks, but you might need to adjust the path to the `dwh_utils` module.\n",
|
|||
|
|
"\n",
|
|||
|
|
"import sys\n",
|
|||
|
|
"import os\n",
|
|||
|
|
"sys.path.append(os.path.abspath(\"../../utils\")) # Adjust path if needed\n",
|
|||
|
|
"\n",
|
|||
|
|
"from dwh_utils import read_credentials, create_postgres_engine, query_to_dataframe, test_connection\n",
|
|||
|
|
"\n",
|
|||
|
|
"# --- Connect to DWH ---\n",
|
|||
|
|
"creds = read_credentials()\n",
|
|||
|
|
"dwh_pg_engine = create_postgres_engine(creds)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# --- Test Query ---\n",
|
|||
|
|
"test_connection()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "c86f94f1",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Data Extraction\n",
|
|||
|
|
"In this section we extract the data for our first attempt on Basic Booking Attributes modelling.\n",
|
|||
|
|
"\n",
|
|||
|
|
"This SQL query retrieves a clean and relevant subset of booking data for our model. It includes:\n",
|
|||
|
|
"- A **unique booking ID**\n",
|
|||
|
|
"- Key **numeric features** such as number of services, time between booking creation and check-in, and number of nights\n",
|
|||
|
|
"- Several **categorical (boolean) features** related to service usage\n",
|
|||
|
|
"- A **target variable** (`has_resolution_incident`) indicating whether a resolution incident occurred\n",
|
|||
|
|
"\n",
|
|||
|
|
"Filters applied being:\n",
|
|||
|
|
"1. Bookings from **\"New Dash\" users** with a valid deal ID\n",
|
|||
|
|
"2. Only **protected bookings**, i.e., those with Protection or Deposit Management services\n",
|
|||
|
|
"3. Bookings flagged for **risk categorisation** (excluding incomplete/rejected ones)\n",
|
|||
|
|
"4. Bookings that are **already completed**\n",
|
|||
|
|
"\n",
|
|||
|
|
"The result is converted into a pandas DataFrame for further processing and modeling.\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 2,
|
|||
|
|
"id": "3e3ed391",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"# Initialise all imports needed for the Notebook\n",
|
|||
|
|
"from sklearn.model_selection import (\n",
|
|||
|
|
" train_test_split, \n",
|
|||
|
|
" GridSearchCV\n",
|
|||
|
|
")\n",
|
|||
|
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|||
|
|
"from sklearn.pipeline import Pipeline\n",
|
|||
|
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
|
"import pandas as pd\n",
|
|||
|
|
"import numpy as np\n",
|
|||
|
|
"from datetime import date\n",
|
|||
|
|
"from sklearn.metrics import (\n",
|
|||
|
|
" roc_auc_score, \n",
|
|||
|
|
" average_precision_score,\n",
|
|||
|
|
" classification_report,\n",
|
|||
|
|
" roc_curve, \n",
|
|||
|
|
" auc,\n",
|
|||
|
|
" precision_recall_curve\n",
|
|||
|
|
")\n",
|
|||
|
|
"import matplotlib.pyplot as plt\n",
|
|||
|
|
"import shap"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 3,
|
|||
|
|
"id": "db5e3098",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
" id_booking number_of_applied_services \\\n",
|
|||
|
|
"0 919656 3 \n",
|
|||
|
|
"1 926634 3 \n",
|
|||
|
|
"2 931082 2 \n",
|
|||
|
|
"3 931086 2 \n",
|
|||
|
|
"4 931096 2 \n",
|
|||
|
|
"\n",
|
|||
|
|
" number_of_applied_upgraded_services number_of_applied_billable_services \\\n",
|
|||
|
|
"0 2 2 \n",
|
|||
|
|
"1 2 2 \n",
|
|||
|
|
"2 1 1 \n",
|
|||
|
|
"3 1 1 \n",
|
|||
|
|
"4 1 1 \n",
|
|||
|
|
"\n",
|
|||
|
|
" booking_days_to_check_in booking_number_of_nights \\\n",
|
|||
|
|
"0 87 4 \n",
|
|||
|
|
"1 109 3 \n",
|
|||
|
|
"2 50 7 \n",
|
|||
|
|
"3 15 3 \n",
|
|||
|
|
"4 8 5 \n",
|
|||
|
|
"\n",
|
|||
|
|
" has_verification_request has_billable_services \\\n",
|
|||
|
|
"0 False True \n",
|
|||
|
|
"1 False True \n",
|
|||
|
|
"2 False True \n",
|
|||
|
|
"3 False True \n",
|
|||
|
|
"4 False True \n",
|
|||
|
|
"\n",
|
|||
|
|
" has_upgraded_screening_service_business_type \\\n",
|
|||
|
|
"0 False \n",
|
|||
|
|
"1 False \n",
|
|||
|
|
"2 False \n",
|
|||
|
|
"3 False \n",
|
|||
|
|
"4 False \n",
|
|||
|
|
"\n",
|
|||
|
|
" has_deposit_management_service_business_type \\\n",
|
|||
|
|
"0 True \n",
|
|||
|
|
"1 True \n",
|
|||
|
|
"2 False \n",
|
|||
|
|
"3 False \n",
|
|||
|
|
"4 False \n",
|
|||
|
|
"\n",
|
|||
|
|
" has_protection_service_business_type has_resolution_incident \n",
|
|||
|
|
"0 True False \n",
|
|||
|
|
"1 True False \n",
|
|||
|
|
"2 True False \n",
|
|||
|
|
"3 True False \n",
|
|||
|
|
"4 True False \n",
|
|||
|
|
"Total Bookings: 16,193\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Query to extract data\n",
|
|||
|
|
"data_extraction_query = \"\"\"\n",
|
|||
|
|
"select \n",
|
|||
|
|
" -- Unique ID --\n",
|
|||
|
|
" ibs.id_booking,\n",
|
|||
|
|
" -- Numeric Features --\n",
|
|||
|
|
" ibs.number_of_applied_services,\n",
|
|||
|
|
" ibs.number_of_applied_upgraded_services,\n",
|
|||
|
|
" ibs.number_of_applied_billable_services,\n",
|
|||
|
|
" ibs.booking_check_in_date_utc - booking_created_date_utc as booking_days_to_check_in,\n",
|
|||
|
|
" ibs.booking_number_of_nights,\n",
|
|||
|
|
" -- Categorical (Boolean) Features --\n",
|
|||
|
|
" ibs.has_verification_request,\n",
|
|||
|
|
" ibs.has_billable_services,\n",
|
|||
|
|
" ibs.has_upgraded_screening_service_business_type,\n",
|
|||
|
|
" ibs.has_deposit_management_service_business_type,\n",
|
|||
|
|
" ibs.has_protection_service_business_type,\n",
|
|||
|
|
" -- Target (Boolean) --\n",
|
|||
|
|
" ibs.has_resolution_incident\n",
|
|||
|
|
"from intermediate.int_booking_summary ibs\n",
|
|||
|
|
"where \n",
|
|||
|
|
" -- 1. Bookings from New Dash users with Id Deal\n",
|
|||
|
|
" ibs.is_user_in_new_dash = True and \n",
|
|||
|
|
" ibs.is_missing_id_deal = False and\n",
|
|||
|
|
" -- 2. Protected Bookings with a Protection or a Deposit Management service\n",
|
|||
|
|
" (ibs.has_protection_service_business_type or \n",
|
|||
|
|
" ibs.has_deposit_management_service_business_type) and\n",
|
|||
|
|
" -- 3. Bookings with flagging categorisation (this excludes Cancelled/Incomplete/Rejected bookings)\n",
|
|||
|
|
" ibs.is_booking_flagged_as_risk is not null and \n",
|
|||
|
|
" -- 4. Booking is completed\n",
|
|||
|
|
" ibs.is_booking_past_completion_date = True \n",
|
|||
|
|
"\"\"\"\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Retrieve Data from Query\n",
|
|||
|
|
"df_extraction = query_to_dataframe(engine=dwh_pg_engine, query=data_extraction_query)\n",
|
|||
|
|
"print(df_extraction.head())\n",
|
|||
|
|
"print(f\"Total Bookings: {len(df_extraction):,}\")\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Processing\n",
|
|||
|
|
"Processing in this notebook is quite straight-forward: we just drop id booking, split the features and target and apply a scaling to numeric features.\n",
|
|||
|
|
"Afterwards, we split the dataset between train and test and display their sizes and target distribution."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 4,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Training set size: 11335 rows\n",
|
|||
|
|
"Test set size: 4858 rows\n",
|
|||
|
|
"\n",
|
|||
|
|
"Training target distribution:\n",
|
|||
|
|
"has_resolution_incident\n",
|
|||
|
|
"False 0.988619\n",
|
|||
|
|
"True 0.011381\n",
|
|||
|
|
"Name: proportion, dtype: float64\n",
|
|||
|
|
"\n",
|
|||
|
|
"Test target distribution:\n",
|
|||
|
|
"has_resolution_incident\n",
|
|||
|
|
"False 0.988473\n",
|
|||
|
|
"True 0.011527\n",
|
|||
|
|
"Name: proportion, dtype: float64\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Drop ID column\n",
|
|||
|
|
"df = df_extraction.copy().drop(columns=['id_booking'])\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Separate features and target\n",
|
|||
|
|
"X = df.drop(columns=['has_resolution_incident'])\n",
|
|||
|
|
"y = df['has_resolution_incident']\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Scale numeric features\n",
|
|||
|
|
"numeric_features = ['number_of_applied_services', \n",
|
|||
|
|
" 'booking_number_of_nights', \n",
|
|||
|
|
" 'number_of_applied_upgraded_services',\n",
|
|||
|
|
" 'number_of_applied_billable_services',\n",
|
|||
|
|
" 'booking_days_to_check_in']\n",
|
|||
|
|
"X[numeric_features] = X[numeric_features].astype(float)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Split the data\n",
|
|||
|
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3, random_state=123)\n",
|
|||
|
|
"\n",
|
|||
|
|
"print(f\"Training set size: {X_train.shape[0]} rows\")\n",
|
|||
|
|
"print(f\"Test set size: {X_test.shape[0]} rows\")\n",
|
|||
|
|
"\n",
|
|||
|
|
"print(\"\\nTraining target distribution:\")\n",
|
|||
|
|
"print(y_train.value_counts(normalize=True))\n",
|
|||
|
|
"\n",
|
|||
|
|
"print(\"\\nTest target distribution:\")\n",
|
|||
|
|
"print(y_test.value_counts(normalize=True))"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "d36c9276",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Classification Model with Random Forest\n",
|
|||
|
|
"\n",
|
|||
|
|
"We define a machine learning pipeline that includes:\n",
|
|||
|
|
"- **Scaling numeric features** with `StandardScaler`\n",
|
|||
|
|
"- **Training a Random Forest classifier** with balanced class weights to handle the imbalanced dataset\n",
|
|||
|
|
"\n",
|
|||
|
|
"We then use `GridSearchCV` to perform a **grid search with cross-validation** over a range of key hyperparameters (e.g., number of trees, max depth, etc.). \n",
|
|||
|
|
"The model is evaluated using **Average Precision**, which is better suited for imbalanced classification tasks.\n",
|
|||
|
|
"\n",
|
|||
|
|
"The best combination of parameters is selected, and the resulting model is used to make predictions on the test set.\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 5,
|
|||
|
|
"id": "943ef7d6",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Fitting 5 folds for each of 72 candidates, totalling 360 fits\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 2.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 6.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 6.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 6.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 7.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 6.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 4.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.1s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 6.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 6.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 6.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 7.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.8s\n",
|
|||
|
|
"[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 3.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.1s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 5.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 2.0s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 5.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 3.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 5.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.0s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 4.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time= 2.7s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 3.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.1s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time= 4.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time= 2.9s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.4s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time= 1.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time= 3.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.3s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time= 2.2s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 2.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 2.8s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 2.6s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 2.5s\n",
|
|||
|
|
"[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time= 2.3s\n",
|
|||
|
|
"Best hyperparameters: {'model__max_depth': 10, 'model__max_features': 'sqrt', 'model__min_samples_leaf': 2, 'model__min_samples_split': 2, 'model__n_estimators': 100}\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"\n",
|
|||
|
|
"# Define pipeline (scaling numeric features only)\n",
|
|||
|
|
"pipeline = Pipeline([\n",
|
|||
|
|
" ('scaler', StandardScaler()),\n",
|
|||
|
|
" ('model', RandomForestClassifier(class_weight='balanced', # We have an imbalanced dataset\n",
|
|||
|
|
" random_state=123))\n",
|
|||
|
|
"])\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Define parameter grid\n",
|
|||
|
|
"param_grid = {\n",
|
|||
|
|
" 'model__n_estimators': [100, 200, 300],\n",
|
|||
|
|
" 'model__max_depth': [None, 10, 20],\n",
|
|||
|
|
" 'model__min_samples_split': [2, 5],\n",
|
|||
|
|
" 'model__min_samples_leaf': [1, 2],\n",
|
|||
|
|
" 'model__max_features': ['sqrt', 'log2']\n",
|
|||
|
|
"}\n",
|
|||
|
|
"\n",
|
|||
|
|
"# GridSearchCV\n",
|
|||
|
|
"grid_search = GridSearchCV(\n",
|
|||
|
|
" estimator=pipeline,\n",
|
|||
|
|
" param_grid=param_grid,\n",
|
|||
|
|
" scoring='average_precision', # For imbalanced classification\n",
|
|||
|
|
" cv=5, # 5-fold cross-validation\n",
|
|||
|
|
" n_jobs=-1, # Use all available cores\n",
|
|||
|
|
" verbose=2 # Verbose output for progress tracking\n",
|
|||
|
|
")\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Fit the grid search on training data\n",
|
|||
|
|
"grid_search.fit(X_train, y_train)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Best model\n",
|
|||
|
|
"best_pipeline = grid_search.best_estimator_\n",
|
|||
|
|
"print(\"Best hyperparameters:\", grid_search.best_params_)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Predict on test set\n",
|
|||
|
|
"y_pred_proba = best_pipeline.predict_proba(X_test)[:, 1]\n",
|
|||
|
|
"y_pred = best_pipeline.predict(X_test)\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 6,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"text/html": [
|
|||
|
|
"<div>\n",
|
|||
|
|
"<style scoped>\n",
|
|||
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
|
" vertical-align: middle;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe tbody tr th {\n",
|
|||
|
|
" vertical-align: top;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"\n",
|
|||
|
|
" .dataframe thead th {\n",
|
|||
|
|
" text-align: right;\n",
|
|||
|
|
" }\n",
|
|||
|
|
"</style>\n",
|
|||
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
|
" <thead>\n",
|
|||
|
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
|
" <th></th>\n",
|
|||
|
|
" <th>mean_fit_time</th>\n",
|
|||
|
|
" <th>std_fit_time</th>\n",
|
|||
|
|
" <th>mean_score_time</th>\n",
|
|||
|
|
" <th>std_score_time</th>\n",
|
|||
|
|
" <th>param_model__max_depth</th>\n",
|
|||
|
|
" <th>param_model__max_features</th>\n",
|
|||
|
|
" <th>param_model__min_samples_leaf</th>\n",
|
|||
|
|
" <th>param_model__min_samples_split</th>\n",
|
|||
|
|
" <th>param_model__n_estimators</th>\n",
|
|||
|
|
" <th>params</th>\n",
|
|||
|
|
" <th>split0_test_score</th>\n",
|
|||
|
|
" <th>split1_test_score</th>\n",
|
|||
|
|
" <th>split2_test_score</th>\n",
|
|||
|
|
" <th>split3_test_score</th>\n",
|
|||
|
|
" <th>split4_test_score</th>\n",
|
|||
|
|
" <th>mean_test_score</th>\n",
|
|||
|
|
" <th>std_test_score</th>\n",
|
|||
|
|
" <th>rank_test_score</th>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </thead>\n",
|
|||
|
|
" <tbody>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>42</th>\n",
|
|||
|
|
" <td>1.191664</td>\n",
|
|||
|
|
" <td>0.060865</td>\n",
|
|||
|
|
" <td>0.060239</td>\n",
|
|||
|
|
" <td>0.003913</td>\n",
|
|||
|
|
" <td>10</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 10, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.035431</td>\n",
|
|||
|
|
" <td>0.023902</td>\n",
|
|||
|
|
" <td>0.019452</td>\n",
|
|||
|
|
" <td>0.022538</td>\n",
|
|||
|
|
" <td>0.026337</td>\n",
|
|||
|
|
" <td>0.025532</td>\n",
|
|||
|
|
" <td>0.005426</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>30</th>\n",
|
|||
|
|
" <td>1.295314</td>\n",
|
|||
|
|
" <td>0.295965</td>\n",
|
|||
|
|
" <td>0.071769</td>\n",
|
|||
|
|
" <td>0.019185</td>\n",
|
|||
|
|
" <td>10</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 10, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.035431</td>\n",
|
|||
|
|
" <td>0.023902</td>\n",
|
|||
|
|
" <td>0.019452</td>\n",
|
|||
|
|
" <td>0.022538</td>\n",
|
|||
|
|
" <td>0.026337</td>\n",
|
|||
|
|
" <td>0.025532</td>\n",
|
|||
|
|
" <td>0.005426</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>31</th>\n",
|
|||
|
|
" <td>2.318125</td>\n",
|
|||
|
|
" <td>0.101894</td>\n",
|
|||
|
|
" <td>0.105294</td>\n",
|
|||
|
|
" <td>0.009273</td>\n",
|
|||
|
|
" <td>10</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 10, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.037634</td>\n",
|
|||
|
|
" <td>0.021405</td>\n",
|
|||
|
|
" <td>0.018878</td>\n",
|
|||
|
|
" <td>0.022386</td>\n",
|
|||
|
|
" <td>0.025625</td>\n",
|
|||
|
|
" <td>0.025186</td>\n",
|
|||
|
|
" <td>0.006589</td>\n",
|
|||
|
|
" <td>3</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>43</th>\n",
|
|||
|
|
" <td>2.513033</td>\n",
|
|||
|
|
" <td>0.161350</td>\n",
|
|||
|
|
" <td>0.120259</td>\n",
|
|||
|
|
" <td>0.020841</td>\n",
|
|||
|
|
" <td>10</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 10, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.037634</td>\n",
|
|||
|
|
" <td>0.021405</td>\n",
|
|||
|
|
" <td>0.018878</td>\n",
|
|||
|
|
" <td>0.022386</td>\n",
|
|||
|
|
" <td>0.025625</td>\n",
|
|||
|
|
" <td>0.025186</td>\n",
|
|||
|
|
" <td>0.006589</td>\n",
|
|||
|
|
" <td>3</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>44</th>\n",
|
|||
|
|
" <td>3.862008</td>\n",
|
|||
|
|
" <td>0.369737</td>\n",
|
|||
|
|
" <td>0.170743</td>\n",
|
|||
|
|
" <td>0.029734</td>\n",
|
|||
|
|
" <td>10</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>300</td>\n",
|
|||
|
|
" <td>{'model__max_depth': 10, 'model__max_features'...</td>\n",
|
|||
|
|
" <td>0.034515</td>\n",
|
|||
|
|
" <td>0.021561</td>\n",
|
|||
|
|
" <td>0.019028</td>\n",
|
|||
|
|
" <td>0.023610</td>\n",
|
|||
|
|
" <td>0.024728</td>\n",
|
|||
|
|
" <td>0.024688</td>\n",
|
|||
|
|
" <td>0.005283</td>\n",
|
|||
|
|
" <td>5</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>...</th>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" <td>...</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>14</th>\n",
|
|||
|
|
" <td>4.705051</td>\n",
|
|||
|
|
" <td>1.009530</td>\n",
|
|||
|
|
" <td>0.263226</td>\n",
|
|||
|
|
" <td>0.106331</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>300</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.028740</td>\n",
|
|||
|
|
" <td>0.015051</td>\n",
|
|||
|
|
" <td>0.015244</td>\n",
|
|||
|
|
" <td>0.018043</td>\n",
|
|||
|
|
" <td>0.012987</td>\n",
|
|||
|
|
" <td>0.018013</td>\n",
|
|||
|
|
" <td>0.005599</td>\n",
|
|||
|
|
" <td>67</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>13</th>\n",
|
|||
|
|
" <td>2.778192</td>\n",
|
|||
|
|
" <td>0.175340</td>\n",
|
|||
|
|
" <td>0.121770</td>\n",
|
|||
|
|
" <td>0.012860</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.030543</td>\n",
|
|||
|
|
" <td>0.013419</td>\n",
|
|||
|
|
" <td>0.014527</td>\n",
|
|||
|
|
" <td>0.016448</td>\n",
|
|||
|
|
" <td>0.012857</td>\n",
|
|||
|
|
" <td>0.017559</td>\n",
|
|||
|
|
" <td>0.006607</td>\n",
|
|||
|
|
" <td>69</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>1</th>\n",
|
|||
|
|
" <td>3.294891</td>\n",
|
|||
|
|
" <td>0.485518</td>\n",
|
|||
|
|
" <td>0.134053</td>\n",
|
|||
|
|
" <td>0.017547</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>200</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.030543</td>\n",
|
|||
|
|
" <td>0.013419</td>\n",
|
|||
|
|
" <td>0.014527</td>\n",
|
|||
|
|
" <td>0.016448</td>\n",
|
|||
|
|
" <td>0.012857</td>\n",
|
|||
|
|
" <td>0.017559</td>\n",
|
|||
|
|
" <td>0.006607</td>\n",
|
|||
|
|
" <td>69</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>0</th>\n",
|
|||
|
|
" <td>1.316659</td>\n",
|
|||
|
|
" <td>0.108668</td>\n",
|
|||
|
|
" <td>0.064057</td>\n",
|
|||
|
|
" <td>0.006920</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>sqrt</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.026317</td>\n",
|
|||
|
|
" <td>0.014495</td>\n",
|
|||
|
|
" <td>0.013819</td>\n",
|
|||
|
|
" <td>0.014843</td>\n",
|
|||
|
|
" <td>0.012623</td>\n",
|
|||
|
|
" <td>0.016419</td>\n",
|
|||
|
|
" <td>0.005007</td>\n",
|
|||
|
|
" <td>71</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" <tr>\n",
|
|||
|
|
" <th>12</th>\n",
|
|||
|
|
" <td>1.497623</td>\n",
|
|||
|
|
" <td>0.385128</td>\n",
|
|||
|
|
" <td>0.083825</td>\n",
|
|||
|
|
" <td>0.028476</td>\n",
|
|||
|
|
" <td>None</td>\n",
|
|||
|
|
" <td>log2</td>\n",
|
|||
|
|
" <td>1</td>\n",
|
|||
|
|
" <td>2</td>\n",
|
|||
|
|
" <td>100</td>\n",
|
|||
|
|
" <td>{'model__max_depth': None, 'model__max_feature...</td>\n",
|
|||
|
|
" <td>0.026317</td>\n",
|
|||
|
|
" <td>0.014495</td>\n",
|
|||
|
|
" <td>0.013819</td>\n",
|
|||
|
|
" <td>0.014843</td>\n",
|
|||
|
|
" <td>0.012623</td>\n",
|
|||
|
|
" <td>0.016419</td>\n",
|
|||
|
|
" <td>0.005007</td>\n",
|
|||
|
|
" <td>71</td>\n",
|
|||
|
|
" </tr>\n",
|
|||
|
|
" </tbody>\n",
|
|||
|
|
"</table>\n",
|
|||
|
|
"<p>72 rows × 18 columns</p>\n",
|
|||
|
|
"</div>"
|
|||
|
|
],
|
|||
|
|
"text/plain": [
|
|||
|
|
" mean_fit_time std_fit_time mean_score_time std_score_time \\\n",
|
|||
|
|
"42 1.191664 0.060865 0.060239 0.003913 \n",
|
|||
|
|
"30 1.295314 0.295965 0.071769 0.019185 \n",
|
|||
|
|
"31 2.318125 0.101894 0.105294 0.009273 \n",
|
|||
|
|
"43 2.513033 0.161350 0.120259 0.020841 \n",
|
|||
|
|
"44 3.862008 0.369737 0.170743 0.029734 \n",
|
|||
|
|
".. ... ... ... ... \n",
|
|||
|
|
"14 4.705051 1.009530 0.263226 0.106331 \n",
|
|||
|
|
"13 2.778192 0.175340 0.121770 0.012860 \n",
|
|||
|
|
"1 3.294891 0.485518 0.134053 0.017547 \n",
|
|||
|
|
"0 1.316659 0.108668 0.064057 0.006920 \n",
|
|||
|
|
"12 1.497623 0.385128 0.083825 0.028476 \n",
|
|||
|
|
"\n",
|
|||
|
|
" param_model__max_depth param_model__max_features \\\n",
|
|||
|
|
"42 10 log2 \n",
|
|||
|
|
"30 10 sqrt \n",
|
|||
|
|
"31 10 sqrt \n",
|
|||
|
|
"43 10 log2 \n",
|
|||
|
|
"44 10 log2 \n",
|
|||
|
|
".. ... ... \n",
|
|||
|
|
"14 None log2 \n",
|
|||
|
|
"13 None log2 \n",
|
|||
|
|
"1 None sqrt \n",
|
|||
|
|
"0 None sqrt \n",
|
|||
|
|
"12 None log2 \n",
|
|||
|
|
"\n",
|
|||
|
|
" param_model__min_samples_leaf param_model__min_samples_split \\\n",
|
|||
|
|
"42 2 2 \n",
|
|||
|
|
"30 2 2 \n",
|
|||
|
|
"31 2 2 \n",
|
|||
|
|
"43 2 2 \n",
|
|||
|
|
"44 2 2 \n",
|
|||
|
|
".. ... ... \n",
|
|||
|
|
"14 1 2 \n",
|
|||
|
|
"13 1 2 \n",
|
|||
|
|
"1 1 2 \n",
|
|||
|
|
"0 1 2 \n",
|
|||
|
|
"12 1 2 \n",
|
|||
|
|
"\n",
|
|||
|
|
" param_model__n_estimators \\\n",
|
|||
|
|
"42 100 \n",
|
|||
|
|
"30 100 \n",
|
|||
|
|
"31 200 \n",
|
|||
|
|
"43 200 \n",
|
|||
|
|
"44 300 \n",
|
|||
|
|
".. ... \n",
|
|||
|
|
"14 300 \n",
|
|||
|
|
"13 200 \n",
|
|||
|
|
"1 200 \n",
|
|||
|
|
"0 100 \n",
|
|||
|
|
"12 100 \n",
|
|||
|
|
"\n",
|
|||
|
|
" params split0_test_score \\\n",
|
|||
|
|
"42 {'model__max_depth': 10, 'model__max_features'... 0.035431 \n",
|
|||
|
|
"30 {'model__max_depth': 10, 'model__max_features'... 0.035431 \n",
|
|||
|
|
"31 {'model__max_depth': 10, 'model__max_features'... 0.037634 \n",
|
|||
|
|
"43 {'model__max_depth': 10, 'model__max_features'... 0.037634 \n",
|
|||
|
|
"44 {'model__max_depth': 10, 'model__max_features'... 0.034515 \n",
|
|||
|
|
".. ... ... \n",
|
|||
|
|
"14 {'model__max_depth': None, 'model__max_feature... 0.028740 \n",
|
|||
|
|
"13 {'model__max_depth': None, 'model__max_feature... 0.030543 \n",
|
|||
|
|
"1 {'model__max_depth': None, 'model__max_feature... 0.030543 \n",
|
|||
|
|
"0 {'model__max_depth': None, 'model__max_feature... 0.026317 \n",
|
|||
|
|
"12 {'model__max_depth': None, 'model__max_feature... 0.026317 \n",
|
|||
|
|
"\n",
|
|||
|
|
" split1_test_score split2_test_score split3_test_score \\\n",
|
|||
|
|
"42 0.023902 0.019452 0.022538 \n",
|
|||
|
|
"30 0.023902 0.019452 0.022538 \n",
|
|||
|
|
"31 0.021405 0.018878 0.022386 \n",
|
|||
|
|
"43 0.021405 0.018878 0.022386 \n",
|
|||
|
|
"44 0.021561 0.019028 0.023610 \n",
|
|||
|
|
".. ... ... ... \n",
|
|||
|
|
"14 0.015051 0.015244 0.018043 \n",
|
|||
|
|
"13 0.013419 0.014527 0.016448 \n",
|
|||
|
|
"1 0.013419 0.014527 0.016448 \n",
|
|||
|
|
"0 0.014495 0.013819 0.014843 \n",
|
|||
|
|
"12 0.014495 0.013819 0.014843 \n",
|
|||
|
|
"\n",
|
|||
|
|
" split4_test_score mean_test_score std_test_score rank_test_score \n",
|
|||
|
|
"42 0.026337 0.025532 0.005426 1 \n",
|
|||
|
|
"30 0.026337 0.025532 0.005426 1 \n",
|
|||
|
|
"31 0.025625 0.025186 0.006589 3 \n",
|
|||
|
|
"43 0.025625 0.025186 0.006589 3 \n",
|
|||
|
|
"44 0.024728 0.024688 0.005283 5 \n",
|
|||
|
|
".. ... ... ... ... \n",
|
|||
|
|
"14 0.012987 0.018013 0.005599 67 \n",
|
|||
|
|
"13 0.012857 0.017559 0.006607 69 \n",
|
|||
|
|
"1 0.012857 0.017559 0.006607 69 \n",
|
|||
|
|
"0 0.012623 0.016419 0.005007 71 \n",
|
|||
|
|
"12 0.012623 0.016419 0.005007 71 \n",
|
|||
|
|
"\n",
|
|||
|
|
"[72 rows x 18 columns]"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"execution_count": 6,
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "execute_result"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Retrieve cv results\n",
|
|||
|
|
"pd.DataFrame(grid_search.cv_results_).sort_values(by='mean_test_score', ascending=False)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "fc2fcc89",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Evaluation\n",
|
|||
|
|
"This section aims to evaluate how good the new model is vs. the actual Resolution Incidents.\n",
|
|||
|
|
"\n",
|
|||
|
|
"We start by computing and displaying the classification report, ROC Curve, PR Curve and the respective Area Under the Curve (AUC)."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 7,
|
|||
|
|
"id": "30786f7c",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
" precision recall f1-score support\n",
|
|||
|
|
"\n",
|
|||
|
|
" False 0.99 0.92 0.95 4802\n",
|
|||
|
|
" True 0.02 0.16 0.04 56\n",
|
|||
|
|
"\n",
|
|||
|
|
" accuracy 0.91 4858\n",
|
|||
|
|
" macro avg 0.51 0.54 0.49 4858\n",
|
|||
|
|
"weighted avg 0.98 0.91 0.94 4858\n",
|
|||
|
|
"\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# Print classification report\n",
|
|||
|
|
"print(classification_report(y_test, y_pred))"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the Classification Report\n",
|
|||
|
|
"\n",
|
|||
|
|
"The **Classification Report** provides key metrics to evaluate how well the model performed on each class.\n",
|
|||
|
|
"\n",
|
|||
|
|
"It includes the following metrics for each class (0 and 1):\n",
|
|||
|
|
"* Metric: Meaning\n",
|
|||
|
|
"* Precision: Out of all predicted positives, how many were actually positive?\n",
|
|||
|
|
"* Recall: Out of all actual positives, how many did we correctly identify?\n",
|
|||
|
|
"* F1-score: Harmonic mean of precision and recall (balances both)\n",
|
|||
|
|
"* Support: Number of true samples of that class in the test data\n",
|
|||
|
|
"\n",
|
|||
|
|
"Interpretation:\n",
|
|||
|
|
"* Class 0 = No incident\n",
|
|||
|
|
"* Class 1 = Has resolution incident (rare, but important!)\n",
|
|||
|
|
"\n",
|
|||
|
|
"A few explanatory cases:\n",
|
|||
|
|
"* A high recall for class 1 means we're catching most incidents.\n",
|
|||
|
|
"* A high precision for class 1 means when we predict an incident, we're often correct.\n",
|
|||
|
|
"* The F1-score gives a single balanced measure (good for imbalanced data).\n",
|
|||
|
|
"\n",
|
|||
|
|
"Special note for imbalanced data:\n",
|
|||
|
|
"Since class 1 (or just True) is rare (1% in our case), metrics for that class are more critical.\n",
|
|||
|
|
"We want to maximize recall to catch as many real incidents as possible — without letting precision drop too low (to avoid too many false alarms)."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 8,
|
|||
|
|
"id": "4b4da914",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhgAAAHWCAYAAAA1jvBJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB4k0lEQVR4nO3deVhUZf8G8HsGhn0TEVlEEcTc9yX3DUXLLTfA3co2Ld/8WWmLZotWltlblqWZWgqIW5q7lvuairu4IG6AyouKrLM9vz+IgQlQBs9wBub+XBeXZ86cc+Y7jwPcnPOc51EIIQSIiIiIJKSUuwAiIiKqfBgwiIiISHIMGERERCQ5BgwiIiKSHAMGERERSY4Bg4iIiCTHgEFERESSY8AgIiIiyTFgEBERkeQYMIiIiEhyDBhEVmDJkiVQKBSGL1tbW/j7+2Ps2LG4detWsfsIIfDrr7+ic+fO8PDwgJOTExo3boyPPvoImZmZJb7W2rVr0adPH3h5ecHOzg5+fn4YNmwY/vzzz1LVmpOTg6+//hpt27aFu7s7HBwcULduXUycOBEXL14s0/snovKn4FwkRJXfkiVLMG7cOHz00UeoXbs2cnJycOjQISxZsgSBgYE4c+YMHBwcDNvrdDoMHz4cK1euRKdOnTBo0CA4OTlh7969WLFiBRo0aIAdO3agevXqhn2EEHj++eexZMkSNG/eHEOGDIGPjw+Sk5Oxdu1aHDt2DPv370f79u1LrDM1NRW9e/fGsWPH0LdvX4SGhsLFxQXx8fGIjo5GSkoK1Gq1WduKiCQiiKjS++WXXwQAcfToUaP177zzjgAgYmJijNbPmjVLABBTpkwpcqz169cLpVIpevfubbR+zpw5AoD4z3/+I/R6fZH9li1bJg4fPvzIOp999lmhVCrFqlWrijyXk5Mj/u///u+R+5eWRqMRubm5khyLiIrHgEFkBUoKGH/88YcAIGbNmmVYl5WVJapUqSLq1q0rNBpNsccbN26cACAOHjxo2MfT01PUq1dPaLXaMtV46NAhAUCMHz++VNt36dJFdOnSpcj6MWPGiFq1ahkeX716VQAQc+bMEV9//bUICgoSSqVSHDp0SNjY2IgPP/ywyDEuXLggAIhvv/3WsO7evXti0qRJokaNGsLOzk4EBweLzz77TOh0OpPfK5E1YB8MIiuWmJgIAKhSpYph3b59+3Dv3j0MHz4ctra2xe43evRoAMAff/xh2CctLQ3Dhw+HjY1NmWpZv349AGDUqFFl2v9xfvnlF3z77bd46aWX8NVXX8HX1xddunTBypUri2wbExMDGxsbDB06FACQlZWFLl264LfffsPo0aPx3//+Fx06dMC0adMwefJks9RLVNEV/9ODiCqlBw8eIDU1FTk5OTh8+DBmzpwJe3t79O3b17DNuXPnAABNmzYt8Tj5z50/f97o38aNG5e5NimO8Sg3b97E5cuXUa1aNcO68PBwvPzyyzhz5gwaNWpkWB8TE4MuXboY+pjMnTsXV65cwYkTJxASEgIAePnll+Hn54c5c+bg//7v/xAQEGCWuokqKp7BILIioaGhqFatGgICAjBkyBA4Oztj/fr1qFGjhmGbhw8fAgBcXV1LPE7+c+np6Ub/Pmqfx5HiGI8yePBgo3ABAIMGDYKtrS1iYmIM686cOYNz584hPDzcsC42NhadOnVClSpVkJqaavgKDQ2FTqfDnj17zFIzUUXGMxhEVmT+/PmoW7cuHjx4gMWLF2PPnj2wt7c32ib/F3x+0CjOv0OIm5vbY/d5nMLH8PDwKPNxSlK7du0i67y8vNCjRw+sXLkSH3/8MYC8sxe2trYYNGiQYbtLly7h1KlTRQJKvjt37kheL1FFx4BBZEXatGmDVq1aAQAGDhyIjh07Yvjw4YiPj4eLiwsAoH79+gCAU6dOYeDAgcUe59SpUwCABg0aAADq1asHADh9+nSJ+zxO4WN06tTpsdsrFAqIYu6y1+l0xW7v6OhY7PqIiAiMGzcOcXFxaNasGVauXIkePXrAy8vLsI1er0fPnj3x9ttvF3uMunXrPrZeImvDSyREVsrGxgazZ89GUlISvvvuO8P6jh07wsPDAytWrCjxl/WyZcsAwNB3o2PHjqhSpQqioqJK3Odx+vXrBwD47bffSrV9lSpVcP/+/SLrr127ZtLrDhw4EHZ2doiJiUFcXBwuXryIiIgIo22Cg4ORkZGB0NDQYr9q1qxp0msSWQMGDCIr1rVrV7Rp0wbz5s1DTk4OAMDJyQlTpkxBfHw83nvvvSL7bNy4EUuWLEFYWBiefvppwz7vvPMOzp8/j3feeafYMwu//fYbjhw5UmIt7dq1Q+/evbFo0SKsW7euyPNqtRpTpkwxPA4ODsaFCxdw9+5dw7qTJ09i//79pX7/AODh4YGwsDCsXLkS0dHRsLOzK3IWZtiwYTh48CC2bt1aZP/79+9Dq9Wa9JpE1oAjeRJZgfyRPI8ePWq4RJJv1apVGDp0KH744Qe88sorAPIuM4SHh2P16tXo3LkzBg8eDEdHR+zbtw+//fYb6tevj507dxqN5KnX6zF27Fj8+uuvaNGihWEkz5SUFKxbtw5HjhzBgQMH0K5duxLrvHv3Lnr16oWTJ0+iX79+6NGjB5ydnXHp0iVER0cjOTkZubm5APLuOmnUqBGaNm2KF154AXfu3MGCBQtQvXp1pKenG27BTUxMRO3atTFnzhyjgFLY8uXLMXLkSLi6uqJr166GW2bzZWVloVOnTjh16hTGjh2Lli1bIjMzE6dPn8aqVauQmJhodEmFiMCRPImsQUkDbQkhhE6nE8HBwSI4ONhokCydTid++eUX0aFDB+Hm5iYcHBxEw4YNxcyZM0VGRkaJr7Vq1SrRq1cv4enpKWxtbYWvr68IDw8Xu3btKlWtWVlZ4ssvvxStW7cWLi4uws7OToSEhIjXX39dXL582Wjb3377TQQFBQk7OzvRrFkzsXXr1kcOtFWS9PR04ejoKACI3377rdhtHj58KKZNmybq1Kkj7OzshJeXl2jfvr348ssvhVqtLtV7I7ImPINBREREkmMfDCIiIpIcAwYRERFJjgGDiIiIJMeAQURERJJjwCAiIiLJMWAQERGR5KxuLhK9Xo+kpCS4urpCoVDIXQ4REVGFIYTAw4cP4efnB6Xy0ecorC5gJCUlISAgQO4yiIiIKqwbN26gRo0aj9zG6gJG/vTSN27cMEwP/aQ0Gg22bduGXr16QaVSSXJMa8c2lR7bVFpsT+mxTaVljvZMT09HQECA4Xfpo1hdwMi/LOLm5iZpwHBycoKbmxu/KSTCNpUe21RabE/psU2lZc72LE0XA3byJCIiIskxYBAREZHkGDCIiIhIcgwYREREJDkGDCIiIpIcAwYRERFJjgGDiIiIJMeAQURERJJjwCAiIiLJMWAQERGR5GQNGHv27EG/fv3g5+cHhUKBdevWPXafXbt2oUWLFrC3t0edOnWwZMkSs9dJREREppE1YGRmZqJp06aYP39+qba/evUqnn32WXTr1g1xcXH4z3/+gxdffBFbt241c6VERERkClknO+vTpw/69OlT6u0XLFiA2rVr46uvvgIA1K9fH/v27cPXX3+NsLAwc5VJREQWQAggIQHIyZG7kopBowGuX3fF7dvAY2ZWN4sKNZvqwYMHERoaarQuLCwM//nPf0rcJzc3F7m5uYbH6enpAPJmmdNoNJLUlX8cqY5HbFNzYJtKi+0pvZLaVAhg40YFPvzQBqdOPX4WTwJsbHQIDLyGK1e6IzFRg88+k/b3XWlUqICRkpKC6tWrG62rXr060tPTkZ2dDUdHxyL7zJ49GzNnziyyftu2bXBycpK0vu3bt0t6PGKbmgPbVFpsT+nlt6kQQFxcNaxYUR+XLlWRuaqKw8kpC8OGrUStWtcQFRWJxEQFNm06J8mxs7KySr1thQoYZTFt2jRMnjzZ8Dg9PR0BAQHo1asX3NzcJHkNjUaD7du3o2fPnlCpVJIc09qxTaXHNpUW21N6hdv04EE7fPihEvv2GXcVbNFCj6ZNZSqwAlCp7sDbOwa2tveh19ujadO
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 600x500 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# ROC Curve\n",
|
|||
|
|
"fpr, tpr, _ = roc_curve(y_test, y_pred_proba)\n",
|
|||
|
|
"roc_auc = auc(fpr, tpr)\n",
|
|||
|
|
"\n",
|
|||
|
|
"plt.figure(figsize=(6, 5))\n",
|
|||
|
|
"plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (AUC = {roc_auc:.4f})')\n",
|
|||
|
|
"plt.plot([0, 1], [0, 1], color='gray', linestyle='--')\n",
|
|||
|
|
"plt.xlabel('False Positive Rate')\n",
|
|||
|
|
"plt.ylabel('True Positive Rate')\n",
|
|||
|
|
"plt.title('ROC Curve')\n",
|
|||
|
|
"plt.legend(loc='lower right')\n",
|
|||
|
|
"plt.grid(True)\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the ROC Curve\n",
|
|||
|
|
"\n",
|
|||
|
|
"The **Receiver Operating Characteristic (ROC) curve** shows how well the model distinguishes between the positive and negative classes across all decision thresholds.\n",
|
|||
|
|
"\n",
|
|||
|
|
"A quick reminder of the definitions:\n",
|
|||
|
|
"* True Positive Rate (TPR) = Recall\n",
|
|||
|
|
"* False Positive Rate (FPR) = Proportion of negatives wrongly classified as positives\n",
|
|||
|
|
"\n",
|
|||
|
|
"What we display in this plot is:\n",
|
|||
|
|
"* The x-axis is False Positive Rate\n",
|
|||
|
|
"* The y-axis is True Positive Rate\n",
|
|||
|
|
"\n",
|
|||
|
|
"The curve shows how TPR and FPR change as the threshold varies\n",
|
|||
|
|
"\n",
|
|||
|
|
"It's important to note that:\n",
|
|||
|
|
"* A model with no skill will produce a diagonal line (AUC = 0.5)\n",
|
|||
|
|
"* A model with perfect discrimination will hug the top-left corner (AUC = 1.0)\n",
|
|||
|
|
"\n",
|
|||
|
|
"The Area Under the Curve (ROC AUC) gives a single performance score:\n",
|
|||
|
|
"* Closer to 1 means better at ranking positive cases higher than negative ones\n",
|
|||
|
|
"\n",
|
|||
|
|
"**Important!**\n",
|
|||
|
|
"\n",
|
|||
|
|
"While useful, the ROC curve can sometimes overestimate performance when the dataset is imbalanced, because it includes negatives (which dominate in our case, around 99%!). That’s why we also MUST check the Precision-Recall curve."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 9,
|
|||
|
|
"id": "6790d41d",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhgAAAHWCAYAAAA1jvBJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABSaElEQVR4nO3deVxU5f4H8M/sgICgbIoormFpaiiGZmqiJKnXbqWpuZWmKV2VrDQXMks090wlvbncX5qmpVmSiqjlQrdSsXvLXRRTQTHZlxlmnt8f3JkcGRDwGUbk8369eMmc85wz3/kyMB/Pec6MQgghQERERCSR0tEFEBER0YOHAYOIiIikY8AgIiIi6RgwiIiISDoGDCIiIpKOAYOIiIikY8AgIiIi6RgwiIiISDoGDCIiIpKOAYOomhoxYgQCAwMrtM2BAwegUChw4MABu9RU3XXr1g3dunWz3L548SIUCgXWrVvnsJqIqisGDKJyWrduHRQKheXLyckJLVq0QGRkJNLS0hxd3n3P/GJt/lIqlahTpw569+6NxMRER5cnRVpaGiZPnoygoCC4uLigVq1aCA4Oxvvvv4+MjAxHl0dUpdSOLoCounnvvffQuHFjFBQU4NChQ1i5ciXi4uLw3//+Fy4uLlVWx+rVq2EymSq0zZNPPon8/HxotVo7VXV3gwYNQkREBIxGI86cOYMVK1age/fu+Pnnn9G6dWuH1XWvfv75Z0RERCAnJwcvvfQSgoODAQC//PIL5s6dix9++AF79uxxcJVEVYcBg6iCevfujfbt2wMARo0ahbp162LRokX4+uuvMWjQIJvb5ObmolatWlLr0Gg0Fd5GqVTCyclJah0V9dhjj+Gll16y3O7SpQt69+6NlStXYsWKFQ6srPIyMjLw7LPPQqVS4fjx4wgKCrJa/8EHH2D16tVS7ssezyUie+ApEqJ79NRTTwEAkpOTARTPjXB1dcX58+cREREBNzc3DBkyBABgMpmwZMkSPPLII3BycoKvry/GjBmDW7duldjvd999h65du8LNzQ3u7u7o0KEDNm7caFlvaw7Gpk2bEBwcbNmmdevWWLp0qWV9aXMwtmzZguDgYDg7O8PLywsvvfQSrly5YjXG/LiuXLmC/v37w9XVFd7e3pg8eTKMRmOl+9elSxcAwPnz562WZ2RkYOLEiQgICIBOp0OzZs0wb968EkdtTCYTli5ditatW8PJyQne3t54+umn8csvv1jGrF27Fk899RR8fHyg0+nw8MMPY+XKlZWu+U6ffPIJrly5gkWLFpUIFwDg6+uL6dOnW24rFAq8++67JcYFBgZixIgRltvm03Lff/89xo0bBx8fHzRo0ABbt261LLdVi0KhwH//+1/LslOnTuH5559HnTp14OTkhPbt22PHjh339qCJ7oJHMIjukfmFsW7dupZlRUVFCA8PxxNPPIEFCxZYTp2MGTMG69atw8iRI/GPf/wDycnJ+Pjjj3H8+HEcPnzYclRi3bp1ePnll/HII49g6tSp8PDwwPHjx7Fr1y4MHjzYZh3x8fEYNGgQevTogXnz5gEATp48icOHD2PChAml1m+up0OHDoiJiUFaWhqWLl2Kw4cP4/jx4/Dw8LCMNRqNCA8PR8eOHbFgwQLs3bsXCxcuRNOmTfHaa69Vqn8XL14EAHh6elqW5eXloWvXrrhy5QrGjBmDhg0b4siRI5g6dSquXbuGJUuWWMa+8sorWLduHXr37o1Ro0ahqKgIBw8exI8//mg50rRy5Uo88sgj6NevH9RqNb755huMGzcOJpMJ48ePr1Tdt9uxYwecnZ3x/PPP3/O+bBk3bhy8vb0xc+ZM5Obm4plnnoGrqyu++OILdO3a1Wrs5s2b8cgjj6BVq1YAgN9++w2dO3eGv78/pkyZglq1auGLL75A//798eWXX+LZZ5+1S81EEERULmvXrhUAxN69e8WNGzfE5cuXxaZNm0TdunWFs7Oz+OOPP4QQQgwfPlwAEFOmTLHa/uDBgwKA2LBhg9XyXbt2WS3PyMgQbm5uomPHjiI/P99qrMlksnw/fPhw0ahRI8vtCRMmCHd3d1FUVFTqY9i/f78AIPbv3y+EEEKv1wsfHx/RqlUrq/v69ttvBQAxc+ZMq/sDIN577z2rfbZr104EBweXep9mycnJAoCYNWuWuHHjhkhNTRUHDx4UHTp0EADEli1bLGNnz54tatWqJc6cOWO1jylTpgiVSiVSUlKEEELs27dPABD/+Mc/Stzf7b3Ky8srsT48PFw0adLEalnXrl1F165dS9S8du3aMh+bp6enaNOmTZljbgdAREdHl1jeqFEjMXz4cMtt83PuiSeeKPFzHTRokPDx8bFafu3aNaFUKq1+Rj169BCtW7cWBQUFlmUmk0l06tRJNG/evNw1E1UUT5EQVVBYWBi8vb0REBCAF198Ea6urti2bRv8/f2txt35P/otW7agdu3a6NmzJ9LT0y1fwcHBcHV1xf79+wEUH4nIzs7GlClTSsyXUCgUpdbl4eGB3NxcxMfHl/ux/PLLL7h+/TrGjRtndV/PPPMMgoKCsHPnzhLbjB071up2ly5dcOHChXLfZ3R0NLy9veHn54cuXbrg5MmTWLhwodX//rds2YIuXbrA09PTqldhYWEwGo344YcfAABffvklFAoFoqOjS9zP7b1ydna2fJ+ZmYn09HR07doVFy5cQGZmZrlrL01WVhbc3NzueT+lGT16NFQqldWygQMH4vr161anu7Zu3QqTyYSBAwcCAP7880/s27cPAwYMQHZ2tqWPN2/eRHh4OM6ePVviVBiRLDxFQlRBy5cvR4sWLaBWq+Hr64uHHnoISqV1Vler1WjQoIHVsrNnzyIzMxM+Pj4293v9+nUAf51yMR/iLq9x48bhiy++QO/eveHv749evXphwIABePrpp0vd5tKlSwCAhx56qMS6oKAgHDp0yGqZeY7D7Tw9Pa3mkNy4ccNqToarqytcXV0tt1999VW88MILKCgowL59+/DRRx+VmMNx9uxZ/PrrryXuy+z2XtWvXx916tQp9TECwOHDhxEdHY3ExETk5eVZrcvMzETt2rXL3P5u3N3dkZ2dfU/7KEvjxo1LLHv66adRu3ZtbN68GT169ABQfHqkbdu2aNGiBQDg3LlzEEJgxowZmDFjhs19X79+vUQ4JpKBAYOogkJCQizn9kuj0+lKhA6TyQQfHx9s2LDB5jalvZiWl4+PD5KSkrB792589913+O6777B27VoMGzYM69evv6d9m935v2hbOnToYAkuQPERi9snNDZv3hxhYWEAgD59+kClUmHKlCno3r27pa8mkwk9e/bEW2+9ZfM+zC+g5XH+/Hn06NEDQUFBWLRoEQICAqDVahEXF4fFixdX+FJfW4KCgpCUlAS9Xn9PlwCXNln29iMwZjqdDv3798e2bduwYsUKpKWl4fDhw5gzZ45ljPmxTZ48GeHh4Tb33axZs0rXS1QWBgyiKtK0aVPs3bsXnTt3tvmCcfs4APjvf/9b4T/+Wq0Wffv2Rd++fWEymTBu3Dh88sknmDFjhs19NWrUCABw+vRpy9UwZqdPn7asr4gNGzYgPz/fcrtJkyZljp82bRpWr16N6dOnY9euXQCKe5CTk2MJIqVp2rQpdu/ejT///LPUoxjffPMNCgsLsWPHDjRs2NCy3HxKSoa+ffsiMTERX375ZamXKt/O09OzxBtv6fV6XLt2rUL3O3DgQKxfvx4JCQk4efIkhBCW0yPAX73XaDR37SWRbJyDQVRFBgwYAKPRiNmzZ5dYV1RUZHnB6dWrF9zc3BATE4OCggKrcUKIUvd/8+ZNq9tKpRKPPvooAKCwsNDmNu3bt4ePjw9iY2Otxnz33Xc4efIknnnmmXI9ttt17twZYWFhlq+7BQwPDw+MGTMGu3fvRlJSEoDiXiUmJmL37t0lxmdkZKCoqAgA8Nxzz0EIgVmzZpUYZ+6V+ajL7b3LzMzE2rVrK/zYSjN27FjUq1cPb7zxBs6cOVNi/fXr1/H+++9bbjdt2tQyj8Rs1apVFb7cNywsDHXq1MHmzZuxefNmhIS
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 600x500 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# PR Curve\n",
|
|||
|
|
"precision, recall, _ = precision_recall_curve(y_test, y_pred_proba)\n",
|
|||
|
|
"pr_auc = average_precision_score(y_test, y_pred_proba)\n",
|
|||
|
|
"\n",
|
|||
|
|
"plt.figure(figsize=(6, 5))\n",
|
|||
|
|
"plt.plot(recall, precision, color='green', lw=2, label=f'PR curve (AUC = {pr_auc:.4f})')\n",
|
|||
|
|
"plt.xlabel('Recall')\n",
|
|||
|
|
"plt.ylabel('Precision')\n",
|
|||
|
|
"plt.title('Precision-Recall Curve')\n",
|
|||
|
|
"plt.legend(loc='lower left')\n",
|
|||
|
|
"plt.grid(True)\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the Precision-Recall (PR) Curve\n",
|
|||
|
|
"\n",
|
|||
|
|
"The **Precision-Recall (PR) curve** helps evaluate model performance, especially on imbalanced datasets like ours (where positive cases are rare).\n",
|
|||
|
|
"\n",
|
|||
|
|
"A quick reminder of the definitions:\n",
|
|||
|
|
"* Precision = How many of the predicted positives are actually positive\n",
|
|||
|
|
"* Recall = How many of the actual positives the model correctly identifies\n",
|
|||
|
|
"\n",
|
|||
|
|
"What we display in this plot is:\n",
|
|||
|
|
"* The x-axis is Recall \n",
|
|||
|
|
"* The y-axis is Precision \n",
|
|||
|
|
"\n",
|
|||
|
|
"The curve shows the trade-off between them at different model thresholds\n",
|
|||
|
|
"\n",
|
|||
|
|
"In imbalanced datasets, accuracy can be misleading — the PR curve focuses only on the positive class, making it much more meaningful:\n",
|
|||
|
|
"* A higher curve means better performance\n",
|
|||
|
|
"* The area under the curve (PR AUC) summarizes this: closer to 1 is better"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## Feature Importance\n",
|
|||
|
|
"Understanding what drives the prediction is useful for future experiments and business knowledge. Here we track both the native feature importances of the trees, as well as a more heavy SHAP values analysis.\n",
|
|||
|
|
"\n",
|
|||
|
|
"Important! Be aware that SHAP analysis might take quite a bit of time."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 10,
|
|||
|
|
"id": "d66ffe2c",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxkAAAHqCAYAAABoeoNhAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC6cklEQVR4nOzdeVxO6f8/8NddabvblFQSiUqitKBkiZiEaCxZmkkmEppsCV/Sgiwj+1hmfKb4TMTYZ4pBI0O2RNmSNJL5yDQfS6axpfv+/eHX+bi1c5Oa1/PxuB+P7nOuc533dc6pzvs+13XdIqlUKgUREREREZGcKNR1AERERERE1LAwySAiIiIiIrlikkFERERERHLFJIOIiIiIiOSKSQYREREREckVkwwiIiIiIpIrJhlERERERCRXTDKIiIiIiEiumGQQEREREZFcMckgIiIiIiK5YpJBRET0EYiLi4NIJKrwNXv27Peyz1OnTiEiIgKPHj16L/W/i7Ljcf78+boO5a2tX78ecXFxdR0GUZ1QqusAiIiI6H+ioqLQqlUrmWXt27d/L/s6deoUIiMj4efnBx0dnfeyj3+y9evXo0mTJvDz86vrUIg+OCYZREREHxEPDw84OjrWdRjv5O+//4ZYLK7rMOrMkydPoK6uXtdhENUpdpciIiKqRw4ePIju3btDLBZDU1MTAwYMwNWrV2XKXLp0CX5+fjAzM4OqqioMDQ3xxRdf4P79+0KZiIgIzJw5EwDQqlUroWtWXl4e8vLyIBKJKuzqIxKJEBERIVOPSCTCtWvXMHr0aDRu3BjdunUT1n///fdwcHCAmpoadHV1MXLkSNy5c+et2u7n5wcNDQ3k5+dj4MCB0NDQgLGxMb7++msAwOXLl9G7d2+IxWK0bNkS27Ztk9m+rAvWr7/+igkTJkBPTw9aWlrw9fXFw4cPy+1v/fr1sLa2hoqKCpo1a4bJkyeX61rm6uqK9u3bIz09HT169IC6ujr+7//+D6amprh69SqOHz8uHFtXV1cAwIMHDxASEoIOHTpAQ0MDWlpa8PDwQGZmpkzdKSkpEIlE2LlzJxYtWoTmzZtDVVUVbm5uuHnzZrl4z549i/79+6Nx48YQi8WwsbHB6tWrZcpcv34dw4YNg66uLlRVVeHo6IgDBw7IlCkpKUFkZCTMzc2hqqoKPT09dOvWDUeOHKnReSIC+CSDiIjoo1JUVIT//ve/MsuaNGkCAPj3v/+NMWPGwN3dHUuXLsWTJ0+wYcMGdOvWDRcvXoSpqSkA4MiRI/jtt98wduxYGBoa4urVq/jmm29w9epVnDlzBiKRCEOGDMGNGzewfft2rFy5UtiHvr4+/vzzz1rHPXz4cJibmyM6OhpSqRQAsGjRIoSFhcHb2xvjxo3Dn3/+ibVr16JHjx64ePHiW3XRKi0thYeHB3r06IFly5YhPj4eQUFBEIvFmDt3Lnx8fDBkyBBs3LgRvr6+cHZ2Ltf9LCgoCDo6OoiIiEB2djY2bNiA27dvCzf1wKvkKTIyEn369MHEiROFcmlpaUhNTUWjRo2E+u7fvw8PDw+MHDkSn332GQwMDODq6oovv/wSGhoamDt3LgDAwMAAAPDbb79h3759GD58OFq1aoU//vgDmzZtQs+ePXHt2jU0a9ZMJt4lS5ZAQUEBISEhKCoqwrJly+Dj44OzZ88KZY4cOYKBAwfCyMgIU6ZMgaGhIbKysvDTTz9hypQpAICrV6/CxcUFxsbGmD17NsRiMXbu3AkvLy/s3r0bn376qdD2xYsXY9y4cejcuTMeP36M8+fP48KFC+jbt2+tzxn9Q0mJiIiozsXGxkoBVPiSSqXSv/76S6qjoyMdP368zHb37t2Tamtryyx/8uRJufq3b98uBSD99ddfhWVfffWVFID01q1bMmVv3bolBSCNjY0tVw8AaXh4uPA+PDxcCkA6atQomXJ5eXlSRUVF6aJFi2SWX758WaqkpFRueWXHIy0tTVg2ZswYKQBpdHS0sOzhw4dSNTU1qUgkkiYkJAjLr1+/Xi7WsjodHBykL168EJYvW7ZMCkC6f/9+qVQqlRYWFkqVlZWln3zyibS0tFQot27dOikA6XfffScs69mzpxSAdOPGjeXaYG1tLe3Zs2e55c+ePZOpVyp9dcxVVFSkUVFRwrJjx45JAUitrKykz58/F5avXr1aCkB6+fJlqVQqlb58+VLaqlUracuWLaUPHz6UqVcikQg/u7m5STt06CB99uyZzPquXbtKzc3NhWW2trbSAQMGlIubqDbYXYqIiOgj8vXXX+PIkSMyL+DVJ9WPHj3CqFGj8N///ld4KSoqokuXLjh27JhQh5qamvDzs2fP8N///hdOTk4AgAsXLryXuAMDA2Xe79mzBxKJBN7e3jLxGhoawtzcXCbe2ho3bpzws46ODiwtLSEWi+Ht7S0st7S0hI6ODn777bdy2wcEBMg8iZg4cSKUlJSQlJQEADh69ChevHiBqVOnQkHhf7dK48ePh5aWFhITE2XqU1FRwdixY2scv4qKilBvaWkp7t+/Dw0NDVhaWlZ4fsaOHQtlZWXhfffu3QFAaNvFixdx69YtTJ06tdzTobInMw8ePMAvv/wCb29v/PXXX8L5uH//Ptzd3ZGTk4P//Oc/AF4d06tXryInJ6fGbSJ6E7tLERERfUQ6d+5c4cDvshu+3r17V7idlpaW8PODBw8QGRmJhIQEFBYWypQrKiqSY7T/82aXpJycHEilUpibm1dY/vWb/NpQVVWFvr6+zDJtbW00b95cuKF+fXlFYy3ejElDQwNGRkbIy8sDANy+fRvAq0TldcrKyjAzMxPWlzE2NpZJAqojkUiwevVqrF+/Hrdu3UJpaamwTk9Pr1z5Fi1ayLxv3LgxAAhty83NBVD1LGQ3b96EVCpFWFgYwsLCKixTWFgIY2NjREVFYfDgwbCwsED79u3Rr18/fP7557CxsalxG4mYZBAREdUDEokEwKtxGYaGhuXWKyn971+6t7c3Tp06hZkzZ6Jjx47Q0NCARCJBv379hHqq8ubNepnXb4bf9PrTk7J4RSIRDh48CEVFxXLlNTQ0qo2jIhXVVdVy6f8fH/I+vdn26kRHRyMsLAxffPEFFixYAF1dXSgoKGDq1KkVnh95tK2s3pCQELi7u1dYpk2bNgCAHj16IDc3F/v378fhw4exefNmrFy5Ehs3bpR5ikRUFSYZRERE9UDr1q0BAE2bNkWfPn0qLffw4UMkJycjMjIS8+fPF5ZX1PWlsmSi7JPyN2dSevMT/OrilUqlaNWqFSwsLGq83YeQk5ODXr16Ce+Li4tRUFCA/v37AwBatmwJAMjOzoaZmZlQ7sWLF7h161aVx/91lR3fXbt2oVevXvjXv/4ls/zRo0fCAPzaKLs2rly5UmlsZe1o1KhRjeLX1dXF2LFjMXbsWBQXF6NHjx6IiIhgkkE1xjEZRERE9YC7uzu0tLQQHR2NkpKScuvLZoQq+9T7zU+5V61aVW6bsu+yeDOZ0NLSQpMmTfDrr7/KLF+/fn2N4x0yZAgUFRURGRlZLhapVCozne6H9s0338gcww0bNuDly5fw8PAAAPTp0wfKyspYs2aNTOz/+te/UFRUhAEDBtRoP2KxuMJvU1dUVCx3TH744QdhTERt2dvbo1WrVli1alW5/ZXtp2nTpnB1dcWmTZtQUFBQro7XZxR789xoaGigTZs2eP78+VvFR/9MfJJBRERUD2hpaWHDhg34/PPPYW9vj5EjR0JfXx/5+flITEyEi4sL1q1bBy0tLWF615KSEhgbG+Pw4cO4detWuTodHBwAAHPnzsXIkSPRqFEjeHp6QiwWY9y4cViyZAnGjRsHR0dH/Prrr7hx40aN423dujUWLlyIOXPmIC8vD15eXtDU1MStW7ewd+9eBAQEICQkRG7HpzZevHgBNzc3eHt7Izs7G+vXr0e3bt0waNAgAK+m8Z0zZw4iIyPRr18/DBo0SCjXqVMnfPbZZzXaj4ODAzZs2ICFCxeiTZs2aNq0KXr37o2BAwciKioKY8eORdeuXXH58mXEx8fLPDWpDQUFBWzYsAG
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 800x500 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"## BUILT-IN\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Get feature importances from the model\n",
|
|||
|
|
"importances = best_pipeline.named_steps['model'].feature_importances_\n",
|
|||
|
|
"features = X.columns\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Create a Series and sort\n",
|
|||
|
|
"feat_series = pd.Series(importances, index=features).sort_values(ascending=True) # ascending=True for horizontal plot\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Plot Feature Importances\n",
|
|||
|
|
"plt.figure(figsize=(8, 5))\n",
|
|||
|
|
"feat_series.plot(kind='barh', color='skyblue')\n",
|
|||
|
|
"plt.title('Feature Importances')\n",
|
|||
|
|
"plt.xlabel('Importance')\n",
|
|||
|
|
"plt.grid(axis='x')\n",
|
|||
|
|
"plt.tight_layout()\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the Feature Importance Plot\n",
|
|||
|
|
"The **feature importance plot** shows how much each feature contributes to the model’s overall decision-making.\n",
|
|||
|
|
"\n",
|
|||
|
|
"For tree-based models like Random Forest, importance is based on how often and how effectively a feature is used to split the data across all trees.\n",
|
|||
|
|
"A higher score means the feature plays a bigger role in improving prediction accuracy.\n",
|
|||
|
|
"\n",
|
|||
|
|
"In the graph you will see that:\n",
|
|||
|
|
"* Features are ranked from most to least important.\n",
|
|||
|
|
"* The values are relative and model-specific — not directly interpretable as weights or probabilities.\n",
|
|||
|
|
"\n",
|
|||
|
|
"This helps us identify which features the model relies on most when making predictions.\n",
|
|||
|
|
"\n",
|
|||
|
|
"**Important!**\n",
|
|||
|
|
"Unlike SHAP values, native importance doesn't show how a feature affects predictions — only how useful it is to the model overall. For deeper interpretability (e.g., direction and context), SHAP is better (but it takes more time to run)."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 11,
|
|||
|
|
"id": "e2197cea",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stderr",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"ExactExplainer explainer: 4859it [09:15, 8.73it/s] \n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAyoAAAIcCAYAAAAZnVrDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3gUVdvA4d9sS+8FQkJC6F2EIII0pUp9aYqgICioFDv2hq9+iq8iYkFEBBGQDqGqoICA9Kb0GggJIaT3bJvvjyWbLJuQhBbA576uvWBnz8ycmZ2dnGdOU1RVVRFCCCGEEEKIW4imojMghBBCCCGEEJeTQEUIIYQQQghxy5FARQghhBBCCHHLkUBFCCGEEEIIccuRQEUIIYQQQghxy5FARQghhBBCCHHLkUBFCCGEEEIIccuRQEUIIYQQQghxy5FARQghhBBCCHHLkUBFCCGEEEKIW9x7772Hp6dnqZ/FxMSgKAqLFi0q1/avdr0bSVfRGRBCCCGEEEJcHyEhIWzdupXatWtXdFaumQQqQgghhBBC3CFcXFy49957Kzob14U0/RJCCCGEEOIOUVwTLqPRyLPPPou/vz++vr489dRTzJ07F0VRiImJcVg/Ly+PMWPG4OfnR0hICC+//DJms/kmH4WNBCpCCCGEEELcJsxms9PLarVecZ3XXnuNqVOn8uqrrzJ//nysViuvvfZasWnffPNNNBoNCxYs4Omnn+azzz7j+++/vxGHUipp+iWEEEIIIcRtIDs7G71eX+xnHh4exS5PSUlhypQpvPXWW7z66qsAdOnShY4dOxIbG+uUvkWLFkyePBmATp06sX79ehYtWsTTTz99nY6i7CRQEUIIIYS4CiaTiRkzZgAwbNiwEguQQhRL6eu8TF1yxVXc3Nz4888/nZZ/9913zJ07t9h1/vnnH/Ly8ujVq5fD8t69e/P77787pe/cubPD+/r16/PHH39cMV83igQqQgghhBBC3AY0Gg1RUVFOy1euXFniOufPnwcgKCjIYXlwcHCx6X19fR3eGwwG8vLyypnT60P6qAghhBBCCHHTKcW8rr+QkBAALl686LA8MTHxhuzvepJARQghhBBCiDtUw4YNcXV1JTo62mH5smXLKiZD5SBNv4QQQgghhLjpbkwNyuUCAgJ45pln+PDDD3F1daVJkyYsXLiQY8eOAbbmZLeqWzdnQgghhBBC3LFuTtMvgI8//piRI0fy0UcfMWDAAEwmk314Yh8fnxu232ulqKqqVnQmhBBCCCFuNzLql7gmygDnZerCm7b7xx57jM2bN3P69Ombts/ykqZfQgghhBBC3HQ3p+kXwMaNG9myZQvNmjXDarWycuVK5syZw8SJE29aHq6GBCpCCCGEEELcwTw9PVm5ciUTJkwgNzeXyMhIJk6cyPPPP1/RWbsiCVSEEEIIIYS4gzVr1oy//vqrorNRbtKZXgghhBBCCHHLkRoVIYQQQgghbrqb10fldiU1KkIIIYQQQohbjtSoCCGEEEIIcdNJjUpppEZFCCGEEEIIccuRGhUhhBBCCCFuOqlRKY0EKkIIIYQQQtx0EqiURpp+CSGEEEIIIW45UqMihBBCCCHETSc1KqWRGhUhhBBCCCHELUdqVIQQQgghhLjppEalNFKjIoQQQgghhLjlSI2KEEIIIYQQN5laTI2K1LE4khoVIYQQQgghxC1HAhUhhBBCiFvN4q1w1wtw32uw7WhF50aICiFNv4QQQgghbiVt3oDNRwrft3wdRnSE70ZVXJ7EDSANvUojNSpCCCGEuK2pqsofZy2sP2ut6KxcnbhkaPkqaPqCtp9jkFJg2jq4kHbTsyZERZIaFSGEEELctmYdMDP0l4J3Khqs7HoM7q50mxRxrFaoOxay8mzvVbXktAu2wNjuNydf4oaTzvSlkxoVIYQQQtyWrKpaJEi5tAyI+qlCsnN11u0vDFJKk5ELf/wNC/+ChNQbmy8hbgG3yeMGIYQQQghHr260FLvcCmw+Z6V12G3wPPaLFWVP+9bcy973g/8Ovr75ETeR1J+U5jb4BQshhBBCONt4tuTPDiXdBv1VjsTC6n1Xv/4Hi+FkwnXLjhC3GglUhBBCCHFbsaoq3Rea2ZlYcpqB9W6Dp9VNX772bfT++Nq3ISqIUsxLFCVNv4QQQghxW2kyw8I/KVdOM3W/itFqYUwTBR/XW+y5bFyybW6UXNO1b+vouWvfhhC3KAlUhBBCCHFbKS1IAXjlTwCVtzarLOkN3asrZBnB360Cn1rvPgn3vw2ZZew8Xxbm26CJmyhWcaN+CUcSqAghhBDilpaWZyU2ExoEKldVtOsbXViY99TBgWFaInxuYiHRaLJN2rjn1I3Z/vF4qFXlxmxbiAokgYoQQgghbiqLVeVcpkqol4JOc+WAIXSKmfjswveNA69t31lmqPuDhdwXblIRyGKB8BFwIePG7aPPx/D7+/DFSqhdBYa0B80t1txNFENqVEpT7qt4xYoVREVFsWvXrhuRnxuqZ8+ejBw5sqKzcdVu9/wLUV67du0iKiqKFSvKMXznDRIfH09UVBRTp06tsDyMHDmSnj17Vtj+hbge5h02o5toodo0K/qJFip9ZSYjv/jmS41/cAxSAP5OuvY85FnAZLmBTaYyc6DSMFD6gm7AjQ1SAA6eg8rD4aMlMOwr0PaHlMwbu09xzdRiXsKR1Kjc5ubOnYuXl5cUXv7FMjMzmTt3Ls2aNSMqKqqisyPEVTl69CgbNmygZ8+eVKkiTVjuBKqq8upGC1P2Qa4ZqvvCjAcVHlnlmC4xD3y+tAJWIrzgQo4tkHDVQN4NjCWy8q34ud+gWgffx8BawcXO8BGQNa9i8yDENZJA5TayePFiFMWxmvDnn38mJCREApV/sczMTKZNmwZwxwUqTZs2ZcuWLeh0cqsC+Prrr1HVO/OZ27Fjx5g2bRrNmjWTQOUWsuaUhT/OqHSO1OCqszVUuXwSxZ8Pm3l9E/i5wIIeCrUCtGw7Z6blZWXk42nQ+ucrX79nilQC3MggBcDDcAOa3SzZBo99UfFBCkC2EQKHwBMd4P8eBa22onMknEjTr9LIX//biMFgqOgsCHHVzGYzFosFFxeXMq+j0WjKlf5Op9frKzoL4jaVa1KZe9jKkuMqOWZoFwY/HYKYDFuZWgHuDobmlWF9LKTnQWJuYVOUT3cXjRqs1PSGHIut9sNyKdEZoPYMFTDf1GO7Wi6TVFwUMyYVAlxhcW+Iz9Yw+6CF6rv/oaclgQ4j7yYjrDI/7cxl68Y4PHQwum9lQqp48s2fWUzfbaL6uXvocXI3pya+Re2jx2+tomdyFnwSbXu1rge9m9v6rwT7woIt8P06qBkCHw8Gb4+Kzq0QThS1nI/nVqxYwfjx45kyZQpHjhxh0aJFJCYmEhISwvDhw+nRo4c97W+//caaNWs4duwYKSkpuLu706RJE55++mlq1arlsN39+/czffp0jh49SmZmJj4+PtSqVYsRI0bQqFGjch1UQkICkyZNYuvWrYDtqexLL73EM888Q0hICN99951D+u3btzNr1iwOHjyI0WgkPDyc/v37079/f4d0PXv2JCQkhBdffJFJkyZx8OBB9Ho9bdq04bnnnsPf398hfVpaGlOnTuXPP/8kOTmZgIAA2rZty1NPPYWvr689XX5+PjNnzuTXX3/lwoUL6PV6KlWqRKtWrXjuueec9l+Q/5Keni9fvrzMTyRHjhzJ+fPnmTp1KhMnTmTXrl0oikK7du145ZVXcHV1ZebMmSxbtoykpCQiIyMZN24cTZo0sW/DarUyY8YMtm3bxtmzZ0lPTycgIIDWrVvzzDPPOBxrfHw8vXr1YsSIEdSvX59p06Zx4sQJvLy86NatG6NHj3Z4en7gwAEWLVrE33//zYULF9BqtdSsWZPHHnuM+++/3+l4du/ezVdffcWxY8fw9PSkU6dO9OnTh4cffpgRI0bw1FNP2dOqqsrixYtZtmwZp0+fRqPRUL9+fUaMGOFwbovmuXr16syYMYMzZ84QFBTE8OHD6dWrFwkJCfbzZzabadeuHa+99hoeHo43/qSkJKZNm8bmzZt
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 800x550 with 2 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"## SHAP VALUES\n",
|
|||
|
|
"\n",
|
|||
|
|
"# SHAP requires that all features passed to Explainer be numeric (floats/ints)\n",
|
|||
|
|
"X_test_shap = X_test.copy()\n",
|
|||
|
|
"X_test_shap = X_test_shap.astype(float)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Function that returns the probability of the positive class\n",
|
|||
|
|
"def model_predict(data):\n",
|
|||
|
|
" return best_pipeline.predict_proba(data)[:, 1]\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Ensure input to SHAP is numeric\n",
|
|||
|
|
"X_test_shap = X_test.astype(float)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Create SHAP explainer\n",
|
|||
|
|
"explainer = shap.Explainer(model_predict, X_test_shap)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Compute SHAP values\n",
|
|||
|
|
"shap_values = explainer(X_test_shap)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Plot summary\n",
|
|||
|
|
"shap.summary_plot(shap_values.values, X_test_shap)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Interpreting the SHAP Summary Plot\n",
|
|||
|
|
"\n",
|
|||
|
|
"Each point on a row represents a SHAP value for a single prediction (row = feature).\n",
|
|||
|
|
"The x-axis shows how much the feature contributed to increasing or decreasing the prediction.\n",
|
|||
|
|
"* Right (positive SHAP value): pushes prediction toward the positive class (i.e., higher chance of incident).\n",
|
|||
|
|
"* Left (negative SHAP value): pushes prediction toward the negative class (i.e., lower chance of incident).\n",
|
|||
|
|
"\n",
|
|||
|
|
"Color shows the actual feature value for that point:\n",
|
|||
|
|
"* Red = high value\n",
|
|||
|
|
"* Blue = low value\n",
|
|||
|
|
"\n",
|
|||
|
|
"In other words:\n",
|
|||
|
|
"* The position tells you impact.\n",
|
|||
|
|
"* The color tells you feature value.\n",
|
|||
|
|
"* The density (thickness) of dots shows how often a value occurs."
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"metadata": {
|
|||
|
|
"kernelspec": {
|
|||
|
|
"display_name": "venv",
|
|||
|
|
"language": "python",
|
|||
|
|
"name": "python3"
|
|||
|
|
},
|
|||
|
|
"language_info": {
|
|||
|
|
"codemirror_mode": {
|
|||
|
|
"name": "ipython",
|
|||
|
|
"version": 3
|
|||
|
|
},
|
|||
|
|
"file_extension": ".py",
|
|||
|
|
"mimetype": "text/x-python",
|
|||
|
|
"name": "python",
|
|||
|
|
"nbconvert_exporter": "python",
|
|||
|
|
"pygments_lexer": "ipython3",
|
|||
|
|
"version": "3.12.3"
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
"nbformat": 4,
|
|||
|
|
"nbformat_minor": 5
|
|||
|
|
}
|