data-jupyter-notebooks/data_driven_risk_assessment/experiments/ddra_joaquin_weighted.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "84dcd475",
   "metadata": {},
   "source": [
    "# DDRA Joaquin\n",
    "\n",
    "## General Idea\n",
    "The idea is to start with a very simple model with basic Booking attributes. This should serve as a first understanding of what can bring value in the data-driven risk assessment of new dash protected bookings.\n",
    "\n",
    "## Initial setup\n",
    "This first section just ensures that the connection to DWH works correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "12368ce1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🔌 Testing connection using credentials at: /home/joaquin/.superhog-dwh/credentials.yml\n",
      "✅ Connection successful.\n"
     ]
    }
   ],
   "source": [
    "# This script connects to a Data Warehouse (DWH) using PostgreSQL. \n",
    "# This should be common for all Notebooks, but you might need to adjust the path to the `dwh_utils` module.\n",
    "\n",
    "import sys\n",
    "import os\n",
    "sys.path.append(os.path.abspath(\"../../utils\"))  # Adjust path if needed\n",
    "\n",
    "from dwh_utils import read_credentials, create_postgres_engine, query_to_dataframe, test_connection\n",
    "\n",
    "# --- Connect to DWH ---\n",
    "creds = read_credentials()\n",
    "dwh_pg_engine = create_postgres_engine(creds)\n",
    "\n",
    "# --- Test Query ---\n",
    "test_connection()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c86f94f1",
   "metadata": {},
   "source": [
    "## Data Extraction\n",
    "In this section we extract the data for our first attempt on Basic Booking Attributes modelling.\n",
    "\n",
    "This SQL query retrieves a clean and relevant subset of booking data for our model. It includes:\n",
    "- A **unique booking ID**\n",
    "- Key **numeric features** such as number of services, time between booking creation and check-in, and number of nights\n",
    "- Several **categorical (boolean) features** related to service usage\n",
    "- A **target variable** (`has_resolution_incident`) indicating whether a resolution incident occurred\n",
    "\n",
    "Filters applied being:\n",
    "1. Bookings from **\"New Dash\" users** with a valid deal ID\n",
    "2. Only **protected bookings**, i.e., those with Protection or Deposit Management services\n",
    "3. Bookings flagged for **risk categorisation** (excluding incomplete/rejected ones)\n",
    "4. Bookings that are **already completed**\n",
    "\n",
    "The result is converted into a pandas DataFrame for further processing and modeling.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e3ed391",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialise all imports needed for the Notebook\n",
    "from sklearn.model_selection import (\n",
    "    train_test_split, \n",
    "    GridSearchCV\n",
    ")\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.feature_selection import RFE\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.utils.class_weight import compute_class_weight\n",
    "from sklearn.feature_selection import SelectKBest, f_classif\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from datetime import date\n",
    "from sklearn.metrics import (\n",
    "    roc_auc_score, \n",
    "    average_precision_score,\n",
    "    classification_report,\n",
    "    roc_curve, \n",
    "    auc,\n",
    "    precision_recall_curve,\n",
    "    precision_score,\n",
    "    recall_score,\n",
    "    fbeta_score,\n",
    "    confusion_matrix\n",
    ")\n",
    "import matplotlib.pyplot as plt\n",
    "import shap\n",
    "import math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "db5e3098",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   id_booking  days_from_booking_creation_to_check_in  number_of_nights  \\\n",
      "0      919656                                    26.0               4.0   \n",
      "1      926634                                    17.0               3.0   \n",
      "2      931082                                    20.0               7.0   \n",
      "3      931086                                    15.0               3.0   \n",
      "4      931096                                     8.0               5.0   \n",
      "\n",
      "    host_town    host_country host_postcode  host_age  host_months_with_truvi  \\\n",
      "0  Madison CT   United States         06443     125.0                     8.0   \n",
      "1  Madison CT   United States         06443     125.0                     8.0   \n",
      "2      London  United Kingdom       N16 6DD     125.0                     8.0   \n",
      "3      London  United Kingdom       N16 6DD     125.0                     8.0   \n",
      "4      London  United Kingdom       N16 6DD     125.0                     8.0   \n",
      "\n",
      "                   host_account_type host_active_pms_list  ...  \\\n",
      "0                               Host             Hostaway  ...   \n",
      "1                               Host             Hostaway  ...   \n",
      "2  PMC - Property Management Company              Hostify  ...   \n",
      "3  PMC - Property Management Company              Hostify  ...   \n",
      "4  PMC - Property Management Company              Hostify  ...   \n",
      "\n",
      "   number_of_applied_upgraded_services  number_of_applied_billable_services  \\\n",
      "0                                    2                                    2   \n",
      "1                                    2                                    2   \n",
      "2                                    1                                    1   \n",
      "3                                    1                                    1   \n",
      "4                                    1                                    1   \n",
      "\n",
      "   booking_days_to_check_in booking_number_of_nights has_verification_request  \\\n",
      "0                        87                        4                    False   \n",
      "1                       109                        3                    False   \n",
      "2                        50                        7                    False   \n",
      "3                        15                        3                    False   \n",
      "4                         8                        5                    False   \n",
      "\n",
      "  has_billable_services  has_upgraded_screening_service_business_type  \\\n",
      "0                  True                                         False   \n",
      "1                  True                                         False   \n",
      "2                  True                                         False   \n",
      "3                  True                                         False   \n",
      "4                  True                                         False   \n",
      "\n",
      "   has_deposit_management_service_business_type  \\\n",
      "0                                          True   \n",
      "1                                          True   \n",
      "2                                         False   \n",
      "3                                         False   \n",
      "4                                         False   \n",
      "\n",
      "   has_protection_service_business_type  has_resolution_incident  \n",
      "0                                  True                    False  \n",
      "1                                  True                    False  \n",
      "2                                  True                    False  \n",
      "3                                  True                    False  \n",
      "4                                  True                    False  \n",
      "\n",
      "[5 rows x 64 columns]\n",
      "Total Bookings: 21,307\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_48568/805553034.py:455: DtypeWarning: Columns (50) have mixed types. Specify dtype option on import or set low_memory=False.\n",
      "  df_extraction = pd.read_csv(\"/home/joaquin/data-jupyter-notebooks/data_driven_risk_assessment/experiments/data.csv\")\n"
     ]
    }
   ],
   "source": [
    "# Query to extract data\n",
    "data_extraction_query = \"\"\"\n",
    "with\n",
    "    int_core__verification_requests as (\n",
    "        select *\n",
    "        from intermediate.int_core__verification_requests\n",
    "        where created_date_utc >= '2024-10-21'\n",
    "    ),\n",
    "    int_core__bookings as (\n",
    "        select *\n",
    "        from intermediate.int_core__bookings\n",
    "        where created_date_utc >= '2024-10-21'\n",
    "    ),\n",
    "    stg_core__verification as (\n",
    "        select *\n",
    "        from staging.stg_core__verification\n",
    "        where created_date_utc >= '2024-10-21'\n",
    "    ),\n",
    "    int_core__guest_journey_payments as (\n",
    "        select *\n",
    "        from intermediate.int_core__guest_journey_payments\n",
    "        where payment_due_date_utc >= '2024-10-21'\n",
    "    ),\n",
    "    filtered_bookings as (\n",
    "        select *\n",
    "        from intermediate.int_booking_summary\n",
    "        where\n",
    "            is_user_in_new_dash = true\n",
    "            and is_missing_id_deal = false\n",
    "            and (\n",
    "                has_protection_service_business_type\n",
    "                or has_deposit_management_service_business_type\n",
    "            )\n",
    "            and is_booking_flagged_as_risk is not null\n",
    "            and is_booking_past_completion_date = true\n",
    "            and booking_created_date_utc < '2025-06-25'\n",
    "    ),\n",
    "    previous_booking_counts as (\n",
    "        select\n",
    "            id_booking,\n",
    "            id_accommodation,\n",
    "            id_user_guest,\n",
    "            booking_check_in_date_utc,\n",
    "            booking_check_out_date_utc,\n",
    "            count(*) over (\n",
    "                partition by id_accommodation\n",
    "                order by booking_check_in_date_utc\n",
    "                rows between unbounded preceding and 1 preceding\n",
    "            ) as previous_bookings_in_listing_count,\n",
    "            count(*) over (\n",
    "                partition by id_user_guest\n",
    "                order by booking_check_in_date_utc\n",
    "                rows between unbounded preceding and 1 preceding\n",
    "            ) as previous_guest_bookings_count\n",
    "        from filtered_bookings\n",
    "    ),\n",
    "    listing_info as (\n",
    "        select\n",
    "            id_accommodation,\n",
    "            address_line_1 as listing_address,\n",
    "            town as listing_town,\n",
    "            country_name as listing_country,\n",
    "            postcode as listing_postcode,\n",
    "            number_of_bedrooms,\n",
    "            number_of_bathrooms,\n",
    "            friendly_name as listing_description,\n",
    "            id_user_host\n",
    "        from intermediate.int_core__accommodation\n",
    "    ),\n",
    "    host_info as (\n",
    "        select\n",
    "            scu.id_user as id_user_host,\n",
    "            icuh.account_type,\n",
    "            icuh.active_pms_list,\n",
    "            scc.country_name,\n",
    "            scu.billing_town,\n",
    "            scu.billing_postcode,\n",
    "            scu.id_billing_country,\n",
    "            extract(year from age(current_date, scu.date_of_birth)) as host_age,\n",
    "            extract(\n",
    "                month from age(current_date, scu.joined_date_utc)\n",
    "            ) as host_months_with_truvi\n",
    "        from staging.stg_core__user scu\n",
    "        left join\n",
    "            staging.stg_core__country scc on scu.id_billing_country = scc.id_country\n",
    "        left join\n",
    "            intermediate.int_core__user_host icuh on icuh.id_user_host = scu.id_user\n",
    "    ),\n",
    "    guest_info as (\n",
    "        select\n",
    "            scu.id_user as id_user_guest,\n",
    "            scc.country_name,\n",
    "            scu.billing_town,\n",
    "            scu.billing_postcode,\n",
    "            scu.id_billing_country,\n",
    "            extract(year from age(current_date, scu.date_of_birth)) as guest_age,\n",
    "            scu.email,\n",
    "            scu.phone_number\n",
    "        from staging.stg_core__user scu\n",
    "        left join\n",
    "            staging.stg_core__country scc on scu.id_billing_country = scc.id_country\n",
    "    ),\n",
    "    host_listing_counts as (\n",
    "        select id_user_host, count(*) as number_of_listings_of_host\n",
    "        from intermediate.int_core__accommodation\n",
    "        where is_active = true\n",
    "        group by id_user_host\n",
    "    ),\n",
    "    listing_incident_counts as (\n",
    "        select\n",
    "            i.created_date_utc::date as date_day,\n",
    "            i.id_accommodation,\n",
    "            count(*) over (\n",
    "                partition by i.id_accommodation\n",
    "                order by i.created_date_utc::date\n",
    "                rows between unbounded preceding and current row\n",
    "            ) as number_of_previous_incidents_in_listing,\n",
    "            count(i.calculated_payout_amount_in_txn_currency) over (\n",
    "                partition by i.id_accommodation\n",
    "                order by i.created_date_utc::date\n",
    "                rows between unbounded preceding and current row\n",
    "            ) as number_of_previous_payouts_in_listing\n",
    "        from intermediate.int_resolutions__incidents i\n",
    "        where\n",
    "            i.id_accommodation is not null\n",
    "            and i.created_date_utc::date between '2024-10-21' and current_date\n",
    "        order by i.id_accommodation, date_day\n",
    "    ),\n",
    "    guest_incident_counts as (\n",
    "        select\n",
    "            i.created_date_utc::date as date_day,\n",
    "            i.id_user_guest,\n",
    "            count(*) over (\n",
    "                partition by i.id_user_guest\n",
    "                order by i.created_date_utc::date\n",
    "                rows between unbounded preceding and current row\n",
    "            ) as number_of_previous_incidents_of_guest\n",
    "        from intermediate.int_resolutions__incidents i\n",
    "        where\n",
    "            i.id_user_guest is not null\n",
    "            and i.created_date_utc::date between '2024-10-21' and current_date\n",
    "        order by i.id_user_guest, date_day\n",
    "    ),\n",
    "    host_incident_counts as (\n",
    "        select\n",
    "            i.created_date_utc::date as date_day,\n",
    "            i.id_user_host,\n",
    "            count(*) over (\n",
    "                partition by i.id_user_host\n",
    "                order by i.created_date_utc::date\n",
    "                rows between unbounded preceding and current row\n",
    "            ) as number_of_previous_incidents_of_host,\n",
    "            count(i.calculated_payout_amount_in_txn_currency) over (\n",
    "                partition by i.id_user_host\n",
    "                order by i.created_date_utc::date\n",
    "                rows between unbounded preceding and current row\n",
    "            ) as number_of_previous_payouts_of_host\n",
    "        from intermediate.int_resolutions__incidents i\n",
    "        where\n",
    "            i.id_user_host is not null\n",
    "            and i.created_date_utc::date between '2024-10-21' and current_date\n",
    "        order by i.id_user_host, date_day\n",
    "    ),\n",
    "    verification_requests as (\n",
    "        select\n",
    "            icvr.id_verification_request,\n",
    "            extract(\n",
    "                day\n",
    "                from\n",
    "                    age(\n",
    "                        icvr.verification_estimated_started_date_utc,\n",
    "                        icb.created_date_utc\n",
    "                    )\n",
    "            ) as days_to_start_verification,\n",
    "            extract(\n",
    "                day\n",
    "                from\n",
    "                    age(\n",
    "                        icvr.verification_estimated_completed_date_utc,\n",
    "                        icvr.verification_estimated_started_date_utc\n",
    "                    )\n",
    "            ) as days_to_complete_verification,\n",
    "            -- CSAT Results\n",
    "            gsr.experience_rating as guest_csat_score,\n",
    "            gsr.guest_comments as guest_csat_comments,\n",
    "            -- GUEST_PRODUCT fields\n",
    "            max(\n",
    "                case\n",
    "                    when guest_journey_product_type = 'GUEST_PRODUCT' then product_name\n",
    "                end\n",
    "            ) as guest_product_name,\n",
    "            max(\n",
    "                case when guest_journey_product_type = 'GUEST_PRODUCT' then currency end\n",
    "            ) as guest_currency,\n",
    "            max(\n",
    "                case\n",
    "                    when guest_journey_product_type = 'GUEST_PRODUCT'\n",
    "                    then total_amount_in_txn_currency\n",
    "                end\n",
    "            ) as guest_total_amount,\n",
    "            -- VERIFICATION_PRODUCT fields\n",
    "            max(\n",
    "                case\n",
    "                    when guest_journey_product_type = 'VERIFICATION_PRODUCT'\n",
    "                    then product_name\n",
    "                end\n",
    "            ) as verification_product_name,\n",
    "            max(\n",
    "                case\n",
    "                    when guest_journey_product_type = 'VERIFICATION_PRODUCT'\n",
    "                    then currency\n",
    "                end\n",
    "            ) as verification_currency,\n",
    "            max(\n",
    "                case\n",
    "                    when guest_journey_product_type = 'VERIFICATION_PRODUCT'\n",
    "                    then total_amount_in_txn_currency\n",
    "                end\n",
    "            ) as verification_total_amount,\n",
    "            -- Verification Results\n",
    "            max(\n",
    "                case when scv.verification = 'Screening' then id_verification_status end\n",
    "            ) as screening_status,\n",
    "            max(\n",
    "                case\n",
    "                    when scv.verification = 'GovernmentId' then id_verification_status\n",
    "                end\n",
    "            ) as government_id_status,\n",
    "            max(\n",
    "                case when scv.verification = 'Contract' then id_verification_status end\n",
    "            ) as contract_status,\n",
    "            max(\n",
    "                case\n",
    "                    when scv.verification = 'SelfieConfidenceScore'\n",
    "                    then id_verification_status\n",
    "                end\n",
    "            ) as selfie_confidence_score_status,\n",
    "            max(\n",
    "                case\n",
    "                    when scv.verification = 'PaymentValidation'\n",
    "                    then id_verification_status\n",
    "                end\n",
    "            ) as payment_validation_status,\n",
    "            max(\n",
    "                case when scv.verification = 'FirstName' then id_verification_status end\n",
    "            ) as first_name_status,\n",
    "            max(\n",
    "                case\n",
    "                    when scv.verification = 'DateOfBirth' then id_verification_status\n",
    "                end\n",
    "            ) as date_of_birth_status,\n",
    "            max(\n",
    "                case when scv.verification = 'LastName' then id_verification_status end\n",
    "            ) as last_name_status,\n",
    "            max(\n",
    "                case\n",
    "                    when scv.verification = 'AutohostPartner'\n",
    "                    then id_verification_status\n",
    "                end\n",
    "            ) as autohost_partner_status,\n",
    "            max(\n",
    "                case\n",
    "                    when scv.verification = 'CriminalRecord' then id_verification_status\n",
    "                end\n",
    "            ) as criminal_record_status\n",
    "        from int_core__verification_requests icvr\n",
    "        left join\n",
    "            int_core__bookings icb\n",
    "            on icb.id_verification_request = icvr.id_verification_request\n",
    "        left join\n",
    "            stg_core__verification scv\n",
    "            on scv.id_verification_request = icvr.id_verification_request\n",
    "        left join\n",
    "            int_core__guest_journey_payments gjp\n",
    "            on gjp.id_verification_request = icb.id_verification_request\n",
    "        left join\n",
    "            intermediate.int_core__guest_satisfaction_responses gsr\n",
    "            on gsr.id_verification_request = icvr.id_verification_request\n",
    "            and scv.verification in (\n",
    "                'Screening',\n",
    "                'GovernmentId',\n",
    "                'Contract',\n",
    "                'SelfieConfidenceScore',\n",
    "                'PaymentValidation',\n",
    "                'FirstName',\n",
    "                'DateOfBirth',\n",
    "                'LastName',\n",
    "                'AutohostPartner',\n",
    "                'CriminalRecord'\n",
    "            )\n",
    "        group by 1, 2, 3, 4, 5\n",
    "    )\n",
    "select\n",
    "    fb.id_booking,\n",
    "    extract(day from age(fb.booking_check_in_date_utc, fb.booking_created_date_utc)) as days_from_booking_creation_to_check_in,\n",
    "    extract(day from age(fb.booking_check_out_date_utc, fb.booking_check_in_date_utc)) as number_of_nights,\n",
    "    -- Host Info\n",
    "    hi.billing_town as host_town,\n",
    "    hi.country_name as host_country,\n",
    "    hi.billing_postcode as host_postcode,\n",
    "    hi.host_age,\n",
    "    hi.host_months_with_truvi,\n",
    "    hi.account_type as host_account_type,\n",
    "    hi.active_pms_list as host_active_pms_list,\n",
    "    coalesce(hlc.number_of_listings_of_host, 0) as number_of_listings_of_host,\n",
    "    coalesce(\n",
    "        hic.number_of_previous_incidents_of_host, 0\n",
    "    ) as number_of_previous_incidents_of_host,\n",
    "    coalesce(\n",
    "        hic.number_of_previous_payouts_of_host, 0\n",
    "    ) as number_of_previous_payouts_of_host,\n",
    "    -- Guest Info\n",
    "    gi.billing_town as guest_town,\n",
    "    gi.country_name as guest_country,\n",
    "    gi.billing_postcode as guest_postcode,\n",
    "    gi.guest_age,\n",
    "    coalesce(\n",
    "        pbc.previous_guest_bookings_count, 0\n",
    "    ) as number_of_previous_bookings_of_guest,\n",
    "    coalesce(\n",
    "        gic.number_of_previous_incidents_of_guest, 0\n",
    "    ) as number_of_previous_incidents_of_guest,\n",
    "    case\n",
    "        when pbc.previous_bookings_in_listing_count > 0 then true else false\n",
    "    end as has_guest_previously_booked_same_listing,\n",
    "    -- Listing Info\n",
    "    li.listing_address,\n",
    "    li.listing_town,\n",
    "    li.listing_country,\n",
    "    li.listing_postcode,\n",
    "    li.number_of_bedrooms as listing_number_of_bedrooms,\n",
    "    li.number_of_bathrooms as listing_number_of_bathrooms,\n",
    "    li.listing_description,\n",
    "    coalesce(pbc.previous_bookings_in_listing_count, 0) as previous_bookings_in_listing_count,\n",
    "    coalesce(lic.number_of_previous_incidents_in_listing, 0) as number_of_previous_incidents_in_listing,\n",
    "    coalesce(lic.number_of_previous_payouts_in_listing, 0) as number_of_previous_payouts_in_listing,\n",
    "    -- Verification Info\n",
    "    case\n",
    "        when fb.id_verification_request is null then 0\n",
    "        else vr.days_to_start_verification\n",
    "    end as days_to_start_verification,\n",
    "    case \n",
    "        when vr.id_verification_request is null then 0\n",
    "        else vr.days_to_complete_verification\n",
    "    end as days_to_complete_verification,\n",
    "    vr.screening_status,\n",
    "    vr.government_id_status,\n",
    "    vr.contract_status,\n",
    "    vr.selfie_confidence_score_status,\n",
    "    vr.payment_validation_status,\n",
    "    vr.first_name_status,\n",
    "    vr.date_of_birth_status,\n",
    "    vr.last_name_status,\n",
    "    vr.autohost_partner_status,\n",
    "    vr.criminal_record_status,\n",
    "    vr.guest_csat_score,\n",
    "    vr.guest_csat_comments,\n",
    "    -- Boolean features\n",
    "    gi.email is not null as guest_has_email,\n",
    "    gi.phone_number is not null as guest_has_phone_number,\n",
    "    case \n",
    "        when gi.billing_town is null or li.listing_town is null then null \n",
    "        when gi.billing_town = li.listing_town \n",
    "        then true else false \n",
    "    end as is_guest_from_listing_town,\n",
    "    case \n",
    "        when gi.country_name is null or li.listing_country is null then null\n",
    "        when gi.country_name = li.listing_country \n",
    "        then true else false \n",
    "    end as is_guest_from_listing_country,\n",
    "    case \n",
    "        when gi.billing_postcode is null or li.listing_postcode is null then null\n",
    "        when gi.billing_postcode = li.listing_postcode \n",
    "        then true else false \n",
    "    end as is_guest_from_listing_postcode,\n",
    "    case \n",
    "        when hi.billing_town is null or li.listing_town is null then null\n",
    "        when hi.billing_town = li.listing_town \n",
    "        then true else false \n",
    "    end as is_host_from_listing_town,\n",
    "    case \n",
    "        when hi.country_name is null or li.listing_country is null then null\n",
    "        when hi.country_name = li.listing_country \n",
    "        then true else false \n",
    "    end as is_host_from_listing_country,\n",
    "    case \n",
    "        when hi.billing_postcode is null or li.listing_postcode is null then null\n",
    "        when hi.billing_postcode = li.listing_postcode \n",
    "        then true else false \n",
    "    end as is_host_from_listing_postcode,\n",
    "    case\n",
    "        when vr.days_to_complete_verification is null then false\n",
    "        else true\n",
    "    end as has_completed_verification,\n",
    "    -- Numeric features\n",
    "    fb.number_of_applied_services,\n",
    "    fb.number_of_applied_upgraded_services,\n",
    "    fb.number_of_applied_billable_services,\n",
    "    fb.booking_check_in_date_utc\n",
    "    - fb.booking_created_date_utc as booking_days_to_check_in,\n",
    "    fb.booking_number_of_nights,\n",
    "    -- Categorical features\n",
    "    fb.has_verification_request,\n",
    "    fb.has_billable_services,\n",
    "    fb.has_upgraded_screening_service_business_type,\n",
    "    fb.has_deposit_management_service_business_type,\n",
    "    fb.has_protection_service_business_type,\n",
    "    -- Target\n",
    "    fb.has_resolution_incident\n",
    "from filtered_bookings fb\n",
    "left join previous_booking_counts pbc on fb.id_booking = pbc.id_booking\n",
    "left join listing_info li on li.id_accommodation = fb.id_accommodation\n",
    "left join host_info hi on hi.id_user_host = fb.id_user_host\n",
    "left join guest_info gi on gi.id_user_guest = fb.id_user_guest\n",
    "left join host_listing_counts hlc on li.id_user_host = hlc.id_user_host\n",
    "left join\n",
    "    lateral(\n",
    "        select *\n",
    "        from listing_incident_counts lic\n",
    "        where\n",
    "            lic.id_accommodation = fb.id_accommodation\n",
    "            and lic.date_day <= fb.booking_check_in_date_utc\n",
    "        order by lic.date_day desc\n",
    "        limit 1\n",
    "    ) lic\n",
    "    on true\n",
    "left join\n",
    "    lateral(\n",
    "        select *\n",
    "        from guest_incident_counts gic\n",
    "        where\n",
    "            gic.id_user_guest = fb.id_user_guest\n",
    "            and gic.date_day <= fb.booking_check_in_date_utc\n",
    "        order by gic.date_day desc\n",
    "        limit 1\n",
    "    ) gic\n",
    "    on true\n",
    "left join\n",
    "    lateral(\n",
    "        select *\n",
    "        from host_incident_counts hic\n",
    "        where\n",
    "            hic.id_user_host = fb.id_user_host\n",
    "            and hic.date_day <= fb.booking_check_in_date_utc\n",
    "        order by hic.date_day desc\n",
    "        limit 1\n",
    "    ) hic\n",
    "    on true\n",
    "left join\n",
    "    verification_requests vr on vr.id_verification_request = fb.id_verification_request\n",
    "\"\"\"\n",
    "\n",
    "# Retrieve Data from Query\n",
    "# df_extraction = query_to_dataframe(engine=dwh_pg_engine, query=data_extraction_query)\n",
    "df_extraction = pd.read_csv(\"/home/joaquin/data-jupyter-notebooks/data_driven_risk_assessment/experiments/data.csv\")\n",
    "print(df_extraction.head())\n",
    "print(f\"Total Bookings: {len(df_extraction):,}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "b56a8530",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_booking</th>\n",
       "      <th>days_from_booking_creation_to_check_in</th>\n",
       "      <th>number_of_nights</th>\n",
       "      <th>host_town</th>\n",
       "      <th>host_country</th>\n",
       "      <th>host_postcode</th>\n",
       "      <th>host_age</th>\n",
       "      <th>host_months_with_truvi</th>\n",
       "      <th>host_account_type</th>\n",
       "      <th>host_active_pms_list</th>\n",
       "      <th>...</th>\n",
       "      <th>number_of_applied_upgraded_services</th>\n",
       "      <th>number_of_applied_billable_services</th>\n",
       "      <th>booking_days_to_check_in</th>\n",
       "      <th>booking_number_of_nights</th>\n",
       "      <th>has_verification_request</th>\n",
       "      <th>has_billable_services</th>\n",
       "      <th>has_upgraded_screening_service_business_type</th>\n",
       "      <th>has_deposit_management_service_business_type</th>\n",
       "      <th>has_protection_service_business_type</th>\n",
       "      <th>has_resolution_incident</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>919656</td>\n",
       "      <td>26.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Madison CT</td>\n",
       "      <td>United States</td>\n",
       "      <td>06443</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Host</td>\n",
       "      <td>Hostaway</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>87</td>\n",
       "      <td>4</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>926634</td>\n",
       "      <td>17.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Madison CT</td>\n",
       "      <td>United States</td>\n",
       "      <td>06443</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Host</td>\n",
       "      <td>Hostaway</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>109</td>\n",
       "      <td>3</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>931082</td>\n",
       "      <td>20.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>931086</td>\n",
       "      <td>15.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "      <td>3</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>931096</td>\n",
       "      <td>8.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>5</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 64 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   id_booking  days_from_booking_creation_to_check_in  number_of_nights  \\\n",
       "0      919656                                    26.0               4.0   \n",
       "1      926634                                    17.0               3.0   \n",
       "2      931082                                    20.0               7.0   \n",
       "3      931086                                    15.0               3.0   \n",
       "4      931096                                     8.0               5.0   \n",
       "\n",
       "    host_town    host_country host_postcode  host_age  host_months_with_truvi  \\\n",
       "0  Madison CT   United States         06443     125.0                     8.0   \n",
       "1  Madison CT   United States         06443     125.0                     8.0   \n",
       "2      London  United Kingdom       N16 6DD     125.0                     8.0   \n",
       "3      London  United Kingdom       N16 6DD     125.0                     8.0   \n",
       "4      London  United Kingdom       N16 6DD     125.0                     8.0   \n",
       "\n",
       "                   host_account_type host_active_pms_list  ...  \\\n",
       "0                               Host             Hostaway  ...   \n",
       "1                               Host             Hostaway  ...   \n",
       "2  PMC - Property Management Company              Hostify  ...   \n",
       "3  PMC - Property Management Company              Hostify  ...   \n",
       "4  PMC - Property Management Company              Hostify  ...   \n",
       "\n",
       "   number_of_applied_upgraded_services  number_of_applied_billable_services  \\\n",
       "0                                    2                                    2   \n",
       "1                                    2                                    2   \n",
       "2                                    1                                    1   \n",
       "3                                    1                                    1   \n",
       "4                                    1                                    1   \n",
       "\n",
       "   booking_days_to_check_in booking_number_of_nights has_verification_request  \\\n",
       "0                        87                        4                    False   \n",
       "1                       109                        3                    False   \n",
       "2                        50                        7                    False   \n",
       "3                        15                        3                    False   \n",
       "4                         8                        5                    False   \n",
       "\n",
       "  has_billable_services  has_upgraded_screening_service_business_type  \\\n",
       "0                  True                                         False   \n",
       "1                  True                                         False   \n",
       "2                  True                                         False   \n",
       "3                  True                                         False   \n",
       "4                  True                                         False   \n",
       "\n",
       "   has_deposit_management_service_business_type  \\\n",
       "0                                          True   \n",
       "1                                          True   \n",
       "2                                         False   \n",
       "3                                         False   \n",
       "4                                         False   \n",
       "\n",
       "   has_protection_service_business_type  has_resolution_incident  \n",
       "0                                  True                    False  \n",
       "1                                  True                    False  \n",
       "2                                  True                    False  \n",
       "3                                  True                    False  \n",
       "4                                  True                    False  \n",
       "\n",
       "[5 rows x 64 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_extraction.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9a9da26",
   "metadata": {},
   "source": [
    "## Exploratory Data Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "f4545e95",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset size: 21,307 rows and 63 columns\n"
     ]
    }
   ],
   "source": [
    "# Copy dataset to make changes and drop id_booking column\n",
    "df = df_extraction.copy().drop(columns=['id_booking'])\n",
    "\n",
    "# Check size of the dataset\n",
    "print(f\"Dataset size: {df.shape[0]:,} rows and {df.shape[1]:,} columns\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "de574969",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>days_from_booking_creation_to_check_in</th>\n",
       "      <th>number_of_nights</th>\n",
       "      <th>host_town</th>\n",
       "      <th>host_country</th>\n",
       "      <th>host_postcode</th>\n",
       "      <th>host_age</th>\n",
       "      <th>host_months_with_truvi</th>\n",
       "      <th>host_account_type</th>\n",
       "      <th>host_active_pms_list</th>\n",
       "      <th>number_of_listings_of_host</th>\n",
       "      <th>number_of_previous_incidents_of_host</th>\n",
       "      <th>number_of_previous_payouts_of_host</th>\n",
       "      <th>guest_town</th>\n",
       "      <th>guest_country</th>\n",
       "      <th>guest_postcode</th>\n",
       "      <th>guest_age</th>\n",
       "      <th>number_of_previous_bookings_of_guest</th>\n",
       "      <th>number_of_previous_incidents_of_guest</th>\n",
       "      <th>has_guest_previously_booked_same_listing</th>\n",
       "      <th>listing_address</th>\n",
       "      <th>listing_town</th>\n",
       "      <th>listing_country</th>\n",
       "      <th>listing_postcode</th>\n",
       "      <th>listing_number_of_bedrooms</th>\n",
       "      <th>listing_number_of_bathrooms</th>\n",
       "      <th>listing_description</th>\n",
       "      <th>previous_bookings_in_listing_count</th>\n",
       "      <th>number_of_previous_incidents_in_listing</th>\n",
       "      <th>number_of_previous_payouts_in_listing</th>\n",
       "      <th>days_to_start_verification</th>\n",
       "      <th>days_to_complete_verification</th>\n",
       "      <th>screening_status</th>\n",
       "      <th>government_id_status</th>\n",
       "      <th>contract_status</th>\n",
       "      <th>selfie_confidence_score_status</th>\n",
       "      <th>payment_validation_status</th>\n",
       "      <th>first_name_status</th>\n",
       "      <th>date_of_birth_status</th>\n",
       "      <th>last_name_status</th>\n",
       "      <th>autohost_partner_status</th>\n",
       "      <th>criminal_record_status</th>\n",
       "      <th>guest_csat_score</th>\n",
       "      <th>guest_csat_comments</th>\n",
       "      <th>guest_has_email</th>\n",
       "      <th>guest_has_phone_number</th>\n",
       "      <th>is_guest_from_listing_town</th>\n",
       "      <th>is_guest_from_listing_country</th>\n",
       "      <th>is_guest_from_listing_postcode</th>\n",
       "      <th>is_host_from_listing_town</th>\n",
       "      <th>is_host_from_listing_country</th>\n",
       "      <th>is_host_from_listing_postcode</th>\n",
       "      <th>has_completed_verification</th>\n",
       "      <th>number_of_applied_services</th>\n",
       "      <th>number_of_applied_upgraded_services</th>\n",
       "      <th>number_of_applied_billable_services</th>\n",
       "      <th>booking_days_to_check_in</th>\n",
       "      <th>booking_number_of_nights</th>\n",
       "      <th>has_verification_request</th>\n",
       "      <th>has_billable_services</th>\n",
       "      <th>has_upgraded_screening_service_business_type</th>\n",
       "      <th>has_deposit_management_service_business_type</th>\n",
       "      <th>has_protection_service_business_type</th>\n",
       "      <th>has_resolution_incident</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>26.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Madison CT</td>\n",
       "      <td>United States</td>\n",
       "      <td>06443</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Host</td>\n",
       "      <td>Hostaway</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1032</td>\n",
       "      <td>0</td>\n",
       "      <td>True</td>\n",
       "      <td>389 Mountain View Dr, Jeffersonville, VT 05464...</td>\n",
       "      <td>Cambridge</td>\n",
       "      <td>United States</td>\n",
       "      <td>05464</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Mountain Life Retreat at Smuggler's Notch Resort</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>87</td>\n",
       "      <td>4</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>17.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Madison CT</td>\n",
       "      <td>United States</td>\n",
       "      <td>06443</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Host</td>\n",
       "      <td>Hostaway</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1900</td>\n",
       "      <td>0</td>\n",
       "      <td>True</td>\n",
       "      <td>389 Mountain View Dr, Jeffersonville, VT 05464...</td>\n",
       "      <td>Cambridge</td>\n",
       "      <td>United States</td>\n",
       "      <td>05464</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Mountain Life Retreat at Smuggler's Notch Resort</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>109</td>\n",
       "      <td>3</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>20.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>467</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>610</td>\n",
       "      <td>0</td>\n",
       "      <td>True</td>\n",
       "      <td>Tudor Grange Hotel, 31 Gervis Road</td>\n",
       "      <td>Dorset</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>BH1 3EE</td>\n",
       "      <td>12.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Mansion by the Sea, 12BR/12BA, Perfect for Events</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>15.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>467</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>136</td>\n",
       "      <td>0</td>\n",
       "      <td>True</td>\n",
       "      <td>Tudor Grange Hotel, 31 Gervis Road</td>\n",
       "      <td>Dorset</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>BH1 3EE</td>\n",
       "      <td>12.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Mansion by the Sea, 12BR/12BA, Perfect for Events</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "      <td>3</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>125.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>467</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>73</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>Aird House, 15 Wellesley Ct, Rockingham Street</td>\n",
       "      <td>Greater London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>SE1 6PD</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Your London Home: 2BR Flat with Modern Amenities</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>5</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   days_from_booking_creation_to_check_in  number_of_nights   host_town  \\\n",
       "0                                    26.0               4.0  Madison CT   \n",
       "1                                    17.0               3.0  Madison CT   \n",
       "2                                    20.0               7.0      London   \n",
       "3                                    15.0               3.0      London   \n",
       "4                                     8.0               5.0      London   \n",
       "\n",
       "     host_country host_postcode  host_age  host_months_with_truvi  \\\n",
       "0   United States         06443     125.0                     8.0   \n",
       "1   United States         06443     125.0                     8.0   \n",
       "2  United Kingdom       N16 6DD     125.0                     8.0   \n",
       "3  United Kingdom       N16 6DD     125.0                     8.0   \n",
       "4  United Kingdom       N16 6DD     125.0                     8.0   \n",
       "\n",
       "                   host_account_type host_active_pms_list  \\\n",
       "0                               Host             Hostaway   \n",
       "1                               Host             Hostaway   \n",
       "2  PMC - Property Management Company              Hostify   \n",
       "3  PMC - Property Management Company              Hostify   \n",
       "4  PMC - Property Management Company              Hostify   \n",
       "\n",
       "   number_of_listings_of_host  number_of_previous_incidents_of_host  \\\n",
       "0                           2                                     0   \n",
       "1                           2                                     0   \n",
       "2                         467                                     0   \n",
       "3                         467                                     0   \n",
       "4                         467                                     0   \n",
       "\n",
       "   number_of_previous_payouts_of_host guest_town guest_country guest_postcode  \\\n",
       "0                                   0        NaN           NaN            NaN   \n",
       "1                                   0        NaN           NaN            NaN   \n",
       "2                                   0        NaN           NaN            NaN   \n",
       "3                                   0        NaN           NaN            NaN   \n",
       "4                                   0        NaN           NaN            NaN   \n",
       "\n",
       "   guest_age  number_of_previous_bookings_of_guest  \\\n",
       "0        NaN                                  1032   \n",
       "1        NaN                                  1900   \n",
       "2        NaN                                   610   \n",
       "3        NaN                                   136   \n",
       "4        NaN                                    73   \n",
       "\n",
       "   number_of_previous_incidents_of_guest  \\\n",
       "0                                      0   \n",
       "1                                      0   \n",
       "2                                      0   \n",
       "3                                      0   \n",
       "4                                      0   \n",
       "\n",
       "   has_guest_previously_booked_same_listing  \\\n",
       "0                                      True   \n",
       "1                                      True   \n",
       "2                                      True   \n",
       "3                                      True   \n",
       "4                                     False   \n",
       "\n",
       "                                     listing_address    listing_town  \\\n",
       "0  389 Mountain View Dr, Jeffersonville, VT 05464...       Cambridge   \n",
       "1  389 Mountain View Dr, Jeffersonville, VT 05464...       Cambridge   \n",
       "2                 Tudor Grange Hotel, 31 Gervis Road          Dorset   \n",
       "3                 Tudor Grange Hotel, 31 Gervis Road          Dorset   \n",
       "4     Aird House, 15 Wellesley Ct, Rockingham Street  Greater London   \n",
       "\n",
       "  listing_country listing_postcode  listing_number_of_bedrooms  \\\n",
       "0   United States            05464                         2.0   \n",
       "1   United States            05464                         2.0   \n",
       "2  United Kingdom          BH1 3EE                        12.0   \n",
       "3  United Kingdom          BH1 3EE                        12.0   \n",
       "4  United Kingdom          SE1 6PD                         2.0   \n",
       "\n",
       "   listing_number_of_bathrooms  \\\n",
       "0                          2.0   \n",
       "1                          2.0   \n",
       "2                         12.0   \n",
       "3                         12.0   \n",
       "4                          1.0   \n",
       "\n",
       "                                 listing_description  \\\n",
       "0   Mountain Life Retreat at Smuggler's Notch Resort   \n",
       "1   Mountain Life Retreat at Smuggler's Notch Resort   \n",
       "2  Mansion by the Sea, 12BR/12BA, Perfect for Events   \n",
       "3  Mansion by the Sea, 12BR/12BA, Perfect for Events   \n",
       "4   Your London Home: 2BR Flat with Modern Amenities   \n",
       "\n",
       "   previous_bookings_in_listing_count  \\\n",
       "0                                   3   \n",
       "1                                   5   \n",
       "2                                   5   \n",
       "3                                   2   \n",
       "4                                   0   \n",
       "\n",
       "   number_of_previous_incidents_in_listing  \\\n",
       "0                                        0   \n",
       "1                                        0   \n",
       "2                                        0   \n",
       "3                                        0   \n",
       "4                                        0   \n",
       "\n",
       "   number_of_previous_payouts_in_listing  days_to_start_verification  \\\n",
       "0                                      0                         0.0   \n",
       "1                                      0                         0.0   \n",
       "2                                      0                         0.0   \n",
       "3                                      0                         0.0   \n",
       "4                                      0                         0.0   \n",
       "\n",
       "   days_to_complete_verification  screening_status  government_id_status  \\\n",
       "0                            0.0               NaN                   NaN   \n",
       "1                            0.0               NaN                   NaN   \n",
       "2                            0.0               NaN                   NaN   \n",
       "3                            0.0               NaN                   NaN   \n",
       "4                            0.0               NaN                   NaN   \n",
       "\n",
       "   contract_status  selfie_confidence_score_status  payment_validation_status  \\\n",
       "0              NaN                             NaN                        NaN   \n",
       "1              NaN                             NaN                        NaN   \n",
       "2              NaN                             NaN                        NaN   \n",
       "3              NaN                             NaN                        NaN   \n",
       "4              NaN                             NaN                        NaN   \n",
       "\n",
       "   first_name_status  date_of_birth_status  last_name_status  \\\n",
       "0                NaN                   NaN               NaN   \n",
       "1                NaN                   NaN               NaN   \n",
       "2                NaN                   NaN               NaN   \n",
       "3                NaN                   NaN               NaN   \n",
       "4                NaN                   NaN               NaN   \n",
       "\n",
       "   autohost_partner_status  criminal_record_status  guest_csat_score  \\\n",
       "0                      NaN                     NaN               NaN   \n",
       "1                      NaN                     NaN               NaN   \n",
       "2                      NaN                     NaN               NaN   \n",
       "3                      NaN                     NaN               NaN   \n",
       "4                      NaN                     NaN               NaN   \n",
       "\n",
       "  guest_csat_comments  guest_has_email  guest_has_phone_number  \\\n",
       "0                 NaN            False                   False   \n",
       "1                 NaN            False                   False   \n",
       "2                 NaN            False                   False   \n",
       "3                 NaN            False                   False   \n",
       "4                 NaN            False                   False   \n",
       "\n",
       "  is_guest_from_listing_town is_guest_from_listing_country  \\\n",
       "0                        NaN                           NaN   \n",
       "1                        NaN                           NaN   \n",
       "2                        NaN                           NaN   \n",
       "3                        NaN                           NaN   \n",
       "4                        NaN                           NaN   \n",
       "\n",
       "  is_guest_from_listing_postcode  is_host_from_listing_town  \\\n",
       "0                            NaN                      False   \n",
       "1                            NaN                      False   \n",
       "2                            NaN                      False   \n",
       "3                            NaN                      False   \n",
       "4                            NaN                      False   \n",
       "\n",
       "  is_host_from_listing_country is_host_from_listing_postcode  \\\n",
       "0                         True                         False   \n",
       "1                         True                         False   \n",
       "2                         True                         False   \n",
       "3                         True                         False   \n",
       "4                         True                         False   \n",
       "\n",
       "   has_completed_verification  number_of_applied_services  \\\n",
       "0                       False                           3   \n",
       "1                       False                           3   \n",
       "2                       False                           2   \n",
       "3                       False                           2   \n",
       "4                       False                           2   \n",
       "\n",
       "   number_of_applied_upgraded_services  number_of_applied_billable_services  \\\n",
       "0                                    2                                    2   \n",
       "1                                    2                                    2   \n",
       "2                                    1                                    1   \n",
       "3                                    1                                    1   \n",
       "4                                    1                                    1   \n",
       "\n",
       "   booking_days_to_check_in  booking_number_of_nights  \\\n",
       "0                        87                         4   \n",
       "1                       109                         3   \n",
       "2                        50                         7   \n",
       "3                        15                         3   \n",
       "4                         8                         5   \n",
       "\n",
       "   has_verification_request  has_billable_services  \\\n",
       "0                     False                   True   \n",
       "1                     False                   True   \n",
       "2                     False                   True   \n",
       "3                     False                   True   \n",
       "4                     False                   True   \n",
       "\n",
       "   has_upgraded_screening_service_business_type  \\\n",
       "0                                         False   \n",
       "1                                         False   \n",
       "2                                         False   \n",
       "3                                         False   \n",
       "4                                         False   \n",
       "\n",
       "   has_deposit_management_service_business_type  \\\n",
       "0                                          True   \n",
       "1                                          True   \n",
       "2                                         False   \n",
       "3                                         False   \n",
       "4                                         False   \n",
       "\n",
       "   has_protection_service_business_type  has_resolution_incident  \n",
       "0                                  True                    False  \n",
       "1                                  True                    False  \n",
       "2                                  True                    False  \n",
       "3                                  True                    False  \n",
       "4                                  True                    False  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remove columns limit to display all columns and rows\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.max_rows', None)\n",
    "\n",
    "# Preview of the dataset\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "de4c6753",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 21307 entries, 0 to 21306\n",
      "Data columns (total 63 columns):\n",
      " #   Column                                        Non-Null Count  Dtype  \n",
      "---  ------                                        --------------  -----  \n",
      " 0   days_from_booking_creation_to_check_in        21307 non-null  float64\n",
      " 1   number_of_nights                              21307 non-null  float64\n",
      " 2   host_town                                     21281 non-null  object \n",
      " 3   host_country                                  21300 non-null  object \n",
      " 4   host_postcode                                 15800 non-null  object \n",
      " 5   host_age                                      21307 non-null  float64\n",
      " 6   host_months_with_truvi                        21307 non-null  float64\n",
      " 7   host_account_type                             17831 non-null  object \n",
      " 8   host_active_pms_list                          20363 non-null  object \n",
      " 9   number_of_listings_of_host                    21307 non-null  int64  \n",
      " 10  number_of_previous_incidents_of_host          21307 non-null  int64  \n",
      " 11  number_of_previous_payouts_of_host            21307 non-null  int64  \n",
      " 12  guest_town                                    11676 non-null  object \n",
      " 13  guest_country                                 11677 non-null  object \n",
      " 14  guest_postcode                                11676 non-null  object \n",
      " 15  guest_age                                     11677 non-null  float64\n",
      " 16  number_of_previous_bookings_of_guest          21307 non-null  int64  \n",
      " 17  number_of_previous_incidents_of_guest         21307 non-null  int64  \n",
      " 18  has_guest_previously_booked_same_listing      21307 non-null  bool   \n",
      " 19  listing_address                               21307 non-null  object \n",
      " 20  listing_town                                  21307 non-null  object \n",
      " 21  listing_country                               21307 non-null  object \n",
      " 22  listing_postcode                              21307 non-null  object \n",
      " 23  listing_number_of_bedrooms                    21185 non-null  float64\n",
      " 24  listing_number_of_bathrooms                   21185 non-null  float64\n",
      " 25  listing_description                           21294 non-null  object \n",
      " 26  previous_bookings_in_listing_count            21307 non-null  int64  \n",
      " 27  number_of_previous_incidents_in_listing       21307 non-null  int64  \n",
      " 28  number_of_previous_payouts_in_listing         21307 non-null  int64  \n",
      " 29  days_to_start_verification                    20084 non-null  float64\n",
      " 30  days_to_complete_verification                 18500 non-null  float64\n",
      " 31  screening_status                              9332 non-null   float64\n",
      " 32  government_id_status                          8082 non-null   float64\n",
      " 33  contract_status                               5856 non-null   float64\n",
      " 34  selfie_confidence_score_status                6622 non-null   float64\n",
      " 35  payment_validation_status                     8047 non-null   float64\n",
      " 36  first_name_status                             4810 non-null   float64\n",
      " 37  date_of_birth_status                          4810 non-null   float64\n",
      " 38  last_name_status                              4810 non-null   float64\n",
      " 39  autohost_partner_status                       0 non-null      float64\n",
      " 40  criminal_record_status                        2075 non-null   float64\n",
      " 41  guest_csat_score                              3221 non-null   float64\n",
      " 42  guest_csat_comments                           454 non-null    object \n",
      " 43  guest_has_email                               21307 non-null  bool   \n",
      " 44  guest_has_phone_number                        21307 non-null  bool   \n",
      " 45  is_guest_from_listing_town                    11677 non-null  object \n",
      " 46  is_guest_from_listing_country                 11677 non-null  object \n",
      " 47  is_guest_from_listing_postcode                11677 non-null  object \n",
      " 48  is_host_from_listing_town                     21307 non-null  bool   \n",
      " 49  is_host_from_listing_country                  21300 non-null  object \n",
      " 50  is_host_from_listing_postcode                 18102 non-null  object \n",
      " 51  has_completed_verification                    21307 non-null  bool   \n",
      " 52  number_of_applied_services                    21307 non-null  int64  \n",
      " 53  number_of_applied_upgraded_services           21307 non-null  int64  \n",
      " 54  number_of_applied_billable_services           21307 non-null  int64  \n",
      " 55  booking_days_to_check_in                      21307 non-null  int64  \n",
      " 56  booking_number_of_nights                      21307 non-null  int64  \n",
      " 57  has_verification_request                      21307 non-null  bool   \n",
      " 58  has_billable_services                         21307 non-null  bool   \n",
      " 59  has_upgraded_screening_service_business_type  21307 non-null  bool   \n",
      " 60  has_deposit_management_service_business_type  21307 non-null  bool   \n",
      " 61  has_protection_service_business_type          21307 non-null  bool   \n",
      " 62  has_resolution_incident                       21307 non-null  bool   \n",
      "dtypes: bool(11), float64(20), int64(13), object(19)\n",
      "memory usage: 8.7+ MB\n"
     ]
    }
   ],
   "source": [
    "# View summary of dataset\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9c79c06a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Missing Values (%):\n",
      "autohost_partner_status           100.000000\n",
      "guest_csat_comments                97.869245\n",
      "criminal_record_status             90.261416\n",
      "guest_csat_score                   84.882902\n",
      "date_of_birth_status               77.425259\n",
      "last_name_status                   77.425259\n",
      "first_name_status                  77.425259\n",
      "contract_status                    72.516075\n",
      "selfie_confidence_score_status     68.921012\n",
      "payment_validation_status          62.233069\n",
      "government_id_status               62.068804\n",
      "screening_status                   56.202187\n",
      "guest_postcode                     45.201108\n",
      "guest_town                         45.201108\n",
      "guest_country                      45.196414\n",
      "is_guest_from_listing_country      45.196414\n",
      "is_guest_from_listing_postcode     45.196414\n",
      "guest_age                          45.196414\n",
      "is_guest_from_listing_town         45.196414\n",
      "host_postcode                      25.845966\n",
      "host_account_type                  16.313887\n",
      "is_host_from_listing_postcode      15.042005\n",
      "days_to_complete_verification      13.174074\n",
      "days_to_start_verification          5.739898\n",
      "host_active_pms_list                4.430469\n",
      "listing_number_of_bedrooms          0.572582\n",
      "listing_number_of_bathrooms         0.572582\n",
      "host_town                           0.122026\n",
      "listing_description                 0.061013\n",
      "host_country                        0.032853\n",
      "is_host_from_listing_country        0.032853\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# View percentage of missing values\n",
    "missing_values = df.isnull().mean() * 100\n",
    "missing_values = missing_values[missing_values > 0].sort_values(ascending=False)\n",
    "print(\"Missing Values (%):\")\n",
    "print(missing_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1837c541",
   "metadata": {},
   "source": [
    "Despite the small amount of data with on CSAT, I want to check if there might be any interesting correlation with the incidents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "6e89712c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "guest_csat_score\n",
       "1.0    0.010695\n",
       "2.0    0.013761\n",
       "3.0    0.018293\n",
       "4.0    0.013105\n",
       "5.0    0.022619\n",
       "Name: has_resolution_incident, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('guest_csat_score')['has_resolution_incident'].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ce9ed8a0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Correlation: 0.02\n"
     ]
    }
   ],
   "source": [
    "correlation = df['guest_csat_score'].corr(df['has_resolution_incident'])\n",
    "print(f\"Correlation: {correlation:.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8ac447bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dropping columns with more than 50% missing values: ['autohost_partner_status', 'guest_csat_comments', 'criminal_record_status', 'guest_csat_score', 'date_of_birth_status', 'last_name_status', 'first_name_status', 'contract_status', 'selfie_confidence_score_status', 'payment_validation_status', 'government_id_status', 'screening_status']\n"
     ]
    }
   ],
   "source": [
    "# Remove columns with more than 50% missing values\n",
    "threshold = 50\n",
    "columns_to_drop = missing_values[missing_values > threshold].index\n",
    "print(f\"Dropping columns with more than {threshold}% missing values: {columns_to_drop.tolist()}\")\n",
    "df.drop(columns=columns_to_drop, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "20bd5c86",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 18 categorical variables\n",
      "\n",
      "The categorical variables are: ['host_town', 'host_country', 'host_postcode', 'host_account_type', 'host_active_pms_list', 'guest_town', 'guest_country', 'guest_postcode', 'listing_address', 'listing_town', 'listing_country', 'listing_postcode', 'listing_description', 'is_guest_from_listing_town', 'is_guest_from_listing_country', 'is_guest_from_listing_postcode', 'is_host_from_listing_country', 'is_host_from_listing_postcode']\n"
     ]
    }
   ],
   "source": [
    "# Find categorical variables\n",
    "categorical = df.select_dtypes(include=['object']).columns.tolist()\n",
    "print(f'There are {len(categorical)} categorical variables\\n')\n",
    "print('The categorical variables are:', categorical)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "67ddd437",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>host_town</th>\n",
       "      <th>host_country</th>\n",
       "      <th>host_postcode</th>\n",
       "      <th>host_account_type</th>\n",
       "      <th>host_active_pms_list</th>\n",
       "      <th>guest_town</th>\n",
       "      <th>guest_country</th>\n",
       "      <th>guest_postcode</th>\n",
       "      <th>listing_address</th>\n",
       "      <th>listing_town</th>\n",
       "      <th>listing_country</th>\n",
       "      <th>listing_postcode</th>\n",
       "      <th>listing_description</th>\n",
       "      <th>is_guest_from_listing_town</th>\n",
       "      <th>is_guest_from_listing_country</th>\n",
       "      <th>is_guest_from_listing_postcode</th>\n",
       "      <th>is_host_from_listing_country</th>\n",
       "      <th>is_host_from_listing_postcode</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Madison CT</td>\n",
       "      <td>United States</td>\n",
       "      <td>06443</td>\n",
       "      <td>Host</td>\n",
       "      <td>Hostaway</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>389 Mountain View Dr, Jeffersonville, VT 05464...</td>\n",
       "      <td>Cambridge</td>\n",
       "      <td>United States</td>\n",
       "      <td>05464</td>\n",
       "      <td>Mountain Life Retreat at Smuggler's Notch Resort</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Madison CT</td>\n",
       "      <td>United States</td>\n",
       "      <td>06443</td>\n",
       "      <td>Host</td>\n",
       "      <td>Hostaway</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>389 Mountain View Dr, Jeffersonville, VT 05464...</td>\n",
       "      <td>Cambridge</td>\n",
       "      <td>United States</td>\n",
       "      <td>05464</td>\n",
       "      <td>Mountain Life Retreat at Smuggler's Notch Resort</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Tudor Grange Hotel, 31 Gervis Road</td>\n",
       "      <td>Dorset</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>BH1 3EE</td>\n",
       "      <td>Mansion by the Sea, 12BR/12BA, Perfect for Events</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Tudor Grange Hotel, 31 Gervis Road</td>\n",
       "      <td>Dorset</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>BH1 3EE</td>\n",
       "      <td>Mansion by the Sea, 12BR/12BA, Perfect for Events</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>N16 6DD</td>\n",
       "      <td>PMC - Property Management Company</td>\n",
       "      <td>Hostify</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Aird House, 15 Wellesley Ct, Rockingham Street</td>\n",
       "      <td>Greater London</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>SE1 6PD</td>\n",
       "      <td>Your London Home: 2BR Flat with Modern Amenities</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    host_town    host_country host_postcode  \\\n",
       "0  Madison CT   United States         06443   \n",
       "1  Madison CT   United States         06443   \n",
       "2      London  United Kingdom       N16 6DD   \n",
       "3      London  United Kingdom       N16 6DD   \n",
       "4      London  United Kingdom       N16 6DD   \n",
       "\n",
       "                   host_account_type host_active_pms_list guest_town  \\\n",
       "0                               Host             Hostaway        NaN   \n",
       "1                               Host             Hostaway        NaN   \n",
       "2  PMC - Property Management Company              Hostify        NaN   \n",
       "3  PMC - Property Management Company              Hostify        NaN   \n",
       "4  PMC - Property Management Company              Hostify        NaN   \n",
       "\n",
       "  guest_country guest_postcode  \\\n",
       "0           NaN            NaN   \n",
       "1           NaN            NaN   \n",
       "2           NaN            NaN   \n",
       "3           NaN            NaN   \n",
       "4           NaN            NaN   \n",
       "\n",
       "                                     listing_address    listing_town  \\\n",
       "0  389 Mountain View Dr, Jeffersonville, VT 05464...       Cambridge   \n",
       "1  389 Mountain View Dr, Jeffersonville, VT 05464...       Cambridge   \n",
       "2                 Tudor Grange Hotel, 31 Gervis Road          Dorset   \n",
       "3                 Tudor Grange Hotel, 31 Gervis Road          Dorset   \n",
       "4     Aird House, 15 Wellesley Ct, Rockingham Street  Greater London   \n",
       "\n",
       "  listing_country listing_postcode  \\\n",
       "0   United States            05464   \n",
       "1   United States            05464   \n",
       "2  United Kingdom          BH1 3EE   \n",
       "3  United Kingdom          BH1 3EE   \n",
       "4  United Kingdom          SE1 6PD   \n",
       "\n",
       "                                 listing_description  \\\n",
       "0   Mountain Life Retreat at Smuggler's Notch Resort   \n",
       "1   Mountain Life Retreat at Smuggler's Notch Resort   \n",
       "2  Mansion by the Sea, 12BR/12BA, Perfect for Events   \n",
       "3  Mansion by the Sea, 12BR/12BA, Perfect for Events   \n",
       "4   Your London Home: 2BR Flat with Modern Amenities   \n",
       "\n",
       "  is_guest_from_listing_town is_guest_from_listing_country  \\\n",
       "0                        NaN                           NaN   \n",
       "1                        NaN                           NaN   \n",
       "2                        NaN                           NaN   \n",
       "3                        NaN                           NaN   \n",
       "4                        NaN                           NaN   \n",
       "\n",
       "  is_guest_from_listing_postcode is_host_from_listing_country  \\\n",
       "0                            NaN                         True   \n",
       "1                            NaN                         True   \n",
       "2                            NaN                         True   \n",
       "3                            NaN                         True   \n",
       "4                            NaN                         True   \n",
       "\n",
       "  is_host_from_listing_postcode  \n",
       "0                         False  \n",
       "1                         False  \n",
       "2                         False  \n",
       "3                         False  \n",
       "4                         False  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# view the categorical variables\n",
    "df[categorical].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "841347ea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "host_town                           26\n",
       "host_country                         7\n",
       "host_postcode                     5507\n",
       "host_account_type                 3476\n",
       "host_active_pms_list               944\n",
       "guest_town                        9631\n",
       "guest_country                     9630\n",
       "guest_postcode                    9631\n",
       "listing_address                      0\n",
       "listing_town                         0\n",
       "listing_country                      0\n",
       "listing_postcode                     0\n",
       "listing_description                 13\n",
       "is_guest_from_listing_town        9630\n",
       "is_guest_from_listing_country     9630\n",
       "is_guest_from_listing_postcode    9630\n",
       "is_host_from_listing_country         7\n",
       "is_host_from_listing_postcode     3205\n",
       "dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check missing values in categorical variables\n",
    "df[categorical].isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "a58cd17e",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_48568/2855830200.py:2: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
      "  df['is_guest_from_listing_town'] = df['is_guest_from_listing_town'].fillna(False)\n",
      "/tmp/ipykernel_48568/2855830200.py:3: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
      "  df['is_guest_from_listing_country'] = df['is_guest_from_listing_country'].fillna(False)\n",
      "/tmp/ipykernel_48568/2855830200.py:4: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
      "  df['is_guest_from_listing_postcode'] = df['is_guest_from_listing_postcode'].fillna(False)\n",
      "/tmp/ipykernel_48568/2855830200.py:6: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
      "  df['is_host_from_listing_country'] = df['is_host_from_listing_country'].fillna(False)\n",
      "/tmp/ipykernel_48568/2855830200.py:7: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
      "  df['is_host_from_listing_postcode'] = df['is_host_from_listing_postcode'].fillna(False)\n"
     ]
    }
   ],
   "source": [
    "# For all missing values in listing location with both host and guest, we will fill with False\n",
    "df['is_guest_from_listing_town'] = df['is_guest_from_listing_town'].fillna(False)\n",
    "df['is_guest_from_listing_country'] = df['is_guest_from_listing_country'].fillna(False)\n",
    "df['is_guest_from_listing_postcode'] = df['is_guest_from_listing_postcode'].fillna(False)\n",
    "df['is_host_from_listing_town'] = df['is_host_from_listing_town'].fillna(False)\n",
    "df['is_host_from_listing_country'] = df['is_host_from_listing_country'].fillna(False)\n",
    "df['is_host_from_listing_postcode'] = df['is_host_from_listing_postcode'].fillna(False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "e5aefb50",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "host_town                           26\n",
       "host_country                         7\n",
       "host_postcode                     5507\n",
       "host_account_type                 3476\n",
       "host_active_pms_list               944\n",
       "guest_town                        9631\n",
       "guest_country                     9630\n",
       "guest_postcode                    9631\n",
       "listing_address                      0\n",
       "listing_town                         0\n",
       "listing_country                      0\n",
       "listing_postcode                     0\n",
       "listing_description                 13\n",
       "is_guest_from_listing_town           0\n",
       "is_guest_from_listing_country        0\n",
       "is_guest_from_listing_postcode       0\n",
       "is_host_from_listing_country         0\n",
       "is_host_from_listing_postcode        0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Checking again missing values in categorical variables\n",
    "df[categorical].isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "292eaad2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unique values in 'host_account_type':\n",
      "host_account_type\n",
      "PMC - Property Management Company    12719\n",
      "Host                                  5112\n",
      "Name: count, dtype: int64 \n",
      "\n",
      "Unique values in 'host_active_pms_list':\n",
      "host_active_pms_list\n",
      "Hostify               6468\n",
      "Hostaway              3675\n",
      "Guesty                3108\n",
      "Hospitable            2739\n",
      "Hostfully             1905\n",
      "Lodgify               1341\n",
      "OwnerRez               649\n",
      "Avantio                248\n",
      "TrackHs                142\n",
      "Uplisting               61\n",
      "Hospitable Connect      15\n",
      "Smoobu                  12\n",
      "Name: count, dtype: int64 \n",
      "\n",
      "Unique values in 'host_country':\n",
      "host_country\n",
      "United States           10962\n",
      "United Kingdom           6707\n",
      "Canada                   2007\n",
      "Australia                 305\n",
      "Mexico                    273\n",
      "New Zealand               154\n",
      "Sweden                    122\n",
      "Norway                    117\n",
      "Bulgaria                  117\n",
      "Portugal                   87\n",
      "South Africa               78\n",
      "Costa Rica                 75\n",
      "Puerto Rico                50\n",
      "Belgium                    50\n",
      "Italy                      35\n",
      "Barbados                   34\n",
      "Spain                      31\n",
      "France                     26\n",
      "Jamaica                    20\n",
      "Egypt                      19\n",
      "Switzerland                10\n",
      "Isle of Man                 8\n",
      "Bahamas                     3\n",
      "Guernsey                    3\n",
      "United Arab Emirates        2\n",
      "Colombia                    2\n",
      "Germany                     1\n",
      "Greece                      1\n",
      "Hungary                     1\n",
      "Name: count, dtype: int64 \n",
      "\n",
      "Unique values in 'guest_country':\n",
      "guest_country\n",
      "United States                           7409\n",
      "Canada                                  1458\n",
      "United Kingdom                          1175\n",
      "Australia                                287\n",
      "Colombia                                 151\n",
      "Mexico                                   134\n",
      "Germany                                  100\n",
      "Ireland                                   77\n",
      "New Zealand                               70\n",
      "France                                    56\n",
      "Spain                                     53\n",
      "Costa Rica                                43\n",
      "Netherlands                               37\n",
      "Brazil                                    36\n",
      "Switzerland                               34\n",
      "Puerto Rico                               31\n",
      "Italy                                     29\n",
      "Argentina                                 23\n",
      "Singapore                                 23\n",
      "China                                     21\n",
      "Belgium                                   20\n",
      "Ecuador                                   20\n",
      "India                                     20\n",
      "United Arab Emirates                      20\n",
      "Panama                                    19\n",
      "Poland                                    17\n",
      "Dominican Republic                        15\n",
      "Israel                                    14\n",
      "Saudi Arabia                              13\n",
      "South Africa                              12\n",
      "Romania                                   11\n",
      "Malaysia                                  11\n",
      "El Salvador                               10\n",
      "Chile                                      9\n",
      "Norway                                     9\n",
      "Japan                                      9\n",
      "Portugal                                   9\n",
      "Sweden                                     8\n",
      "Hong Kong                                  8\n",
      "Austria                                    8\n",
      "South Korea                                8\n",
      "United States Minor Outlying Islands       8\n",
      "Finland                                    8\n",
      "Philippines                                7\n",
      "Czech Republic                             7\n",
      "Guatemala                                  7\n",
      "Hungary                                    6\n",
      "Venezuela                                  6\n",
      "Denmark                                    6\n",
      "Honduras                                   6\n",
      "Jamaica                                    5\n",
      "Thailand                                   5\n",
      "Peru                                       5\n",
      "Taiwan                                     5\n",
      "Russian Federation                         5\n",
      "French Polynesia                           4\n",
      "Turkey                                     4\n",
      "Kazakhstan                                 4\n",
      "Curacao                                    4\n",
      "Martinique                                 3\n",
      "Cayman Islands                             3\n",
      "Saint Pierre and Miquelon                  3\n",
      "Slovenia                                   3\n",
      "Estonia                                    3\n",
      "Iceland                                    3\n",
      "Georgia                                    3\n",
      "Indonesia                                  2\n",
      "Qatar                                      2\n",
      "Greece                                     2\n",
      "Egypt                                      2\n",
      "Latvia                                     2\n",
      "Pakistan                                   2\n",
      "Barbados                                   2\n",
      "Bolivia                                    2\n",
      "Aruba                                      2\n",
      "Malta                                      2\n",
      "Suriname                                   1\n",
      "Lebanon                                    1\n",
      "Nauru                                      1\n",
      "Fiji                                       1\n",
      "Cook Islands                               1\n",
      "Bahamas                                    1\n",
      "Albania                                    1\n",
      "Uruguay                                    1\n",
      "Jersey                                     1\n",
      "Croatia                                    1\n",
      "Bulgaria                                   1\n",
      "Belize                                     1\n",
      "Nicaragua                                  1\n",
      "DR Congo                                   1\n",
      "Kuwait                                     1\n",
      "Niger                                      1\n",
      "Cyprus                                     1\n",
      "Name: count, dtype: int64 \n",
      "\n",
      "Unique values in 'listing_country':\n",
      "listing_country\n",
      "United States           10067\n",
      "United Kingdom           6574\n",
      "Canada                   1870\n",
      "Colombia                  599\n",
      "Australia                 305\n",
      "Mexico                    303\n",
      "Ireland                   168\n",
      "New Zealand               153\n",
      "Virgin Islands, U.s.      130\n",
      "Bahamas                   130\n",
      "Norway                    125\n",
      "Sweden                    122\n",
      "Bulgaria                  117\n",
      "Costa Rica                108\n",
      "Portugal                   87\n",
      "South Africa               83\n",
      "Puerto Rico                50\n",
      "Belgium                    48\n",
      "France                     46\n",
      "Italy                      44\n",
      "Spain                      36\n",
      "Barbados                   34\n",
      "Morocco                    25\n",
      "Jamaica                    20\n",
      "Egypt                      19\n",
      "Saint Lucia                10\n",
      "Germany                    10\n",
      "Sint Maarten                9\n",
      "Isle of Man                 8\n",
      "United Arab Emirates        2\n",
      "Lithuania                   2\n",
      "Antigua and Barbuda         1\n",
      "Greece                      1\n",
      "Hungary                     1\n",
      "Name: count, dtype: int64 \n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Check unique values in host_account_type, host_active_pms_list, host_country and guest_country with their counts\n",
    "print(\"Unique values in 'host_account_type':\")\n",
    "print(df['host_account_type'].value_counts(), \"\\n\")\n",
    "print(\"Unique values in 'host_active_pms_list':\")\n",
    "print(df['host_active_pms_list'].value_counts(), \"\\n\")\n",
    "print(\"Unique values in 'host_country':\")\n",
    "print(df['host_country'].value_counts(), \"\\n\")\n",
    "print(\"Unique values in 'guest_country':\")\n",
    "print(df['guest_country'].value_counts(), \"\\n\")\n",
    "print(\"Unique values in 'listing_country':\")\n",
    "print(df['listing_country'].value_counts(), \"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "7289f9fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Due to the many unique values in host_country, guest_country and listing_country, we will only keep the top 10 most frequent values and set the rest to 'Other'\n",
    "top_host_countries = df['host_country'].value_counts().nlargest(10).index\n",
    "top_guest_countries = df['guest_country'].value_counts().nlargest(10).index\n",
    "top_listing_countries = df['listing_country'].value_counts().nlargest(10).index\n",
    "\n",
    "df['host_country'] = df['host_country'].where(df['host_country'].isin(top_host_countries), 'Other')\n",
    "df['guest_country'] = df['guest_country'].where(df['guest_country'].isin(top_guest_countries), 'Other')\n",
    "df['listing_country'] = df['listing_country'].where(df['listing_country'].isin(top_listing_countries), 'Other')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "7348866c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "New columns created from one-hot encoding: ['host_account_type_Host', 'host_account_type_PMC - Property Management Company', 'host_active_pms_list_Avantio', 'host_active_pms_list_Guesty', 'host_active_pms_list_Hospitable', 'host_active_pms_list_Hospitable Connect', 'host_active_pms_list_Hostaway', 'host_active_pms_list_Hostfully', 'host_active_pms_list_Hostify', 'host_active_pms_list_Lodgify', 'host_active_pms_list_OwnerRez', 'host_active_pms_list_Smoobu', 'host_active_pms_list_TrackHs', 'host_active_pms_list_Uplisting', 'host_country_Australia', 'host_country_Bulgaria', 'host_country_Canada', 'host_country_Mexico', 'host_country_New Zealand', 'host_country_Norway', 'host_country_Other', 'host_country_Portugal', 'host_country_Sweden', 'host_country_United Kingdom', 'host_country_United States', 'guest_country_Australia', 'guest_country_Canada', 'guest_country_Colombia', 'guest_country_France', 'guest_country_Germany', 'guest_country_Ireland', 'guest_country_Mexico', 'guest_country_New Zealand', 'guest_country_Other', 'guest_country_United Kingdom', 'guest_country_United States', 'listing_country_Australia', 'listing_country_Bahamas', 'listing_country_Canada', 'listing_country_Colombia', 'listing_country_Ireland', 'listing_country_Mexico', 'listing_country_New Zealand', 'listing_country_Other', 'listing_country_United Kingdom', 'listing_country_United States', 'listing_country_Virgin Islands, U.s.']\n"
     ]
    }
   ],
   "source": [
    "# Lets one hot encode host_account_type, host_active_pms_list, host_country, guest_country and listing_country\n",
    "df = pd.get_dummies(df, columns=['host_account_type', 'host_active_pms_list', 'host_country', 'guest_country', 'listing_country'], drop_first=False)\n",
    "# Check the new columns created\n",
    "new_columns = df.columns[df.columns.str.startswith(('host_account_type_', 'host_active_pms_list_', 'host_country', 'guest_country', 'listing_country'))]\n",
    "print(f\"New columns created from one-hot encoding: {new_columns.tolist()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "b443ccf4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Drop the original categorical columns and the ones we are not going to use like postcodes and towns\n",
    "df.drop(columns=['host_postcode', 'guest_postcode', 'listing_postcode', 'listing_town', 'host_town', 'guest_town', 'listing_description', 'listing_address'], inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "a31ae1fd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 22 numerical variables\n",
      "\n",
      "The numerical variables are : ['days_from_booking_creation_to_check_in', 'number_of_nights', 'host_age', 'host_months_with_truvi', 'number_of_listings_of_host', 'number_of_previous_incidents_of_host', 'number_of_previous_payouts_of_host', 'guest_age', 'number_of_previous_bookings_of_guest', 'number_of_previous_incidents_of_guest', 'listing_number_of_bedrooms', 'listing_number_of_bathrooms', 'previous_bookings_in_listing_count', 'number_of_previous_incidents_in_listing', 'number_of_previous_payouts_in_listing', 'days_to_start_verification', 'days_to_complete_verification', 'number_of_applied_services', 'number_of_applied_upgraded_services', 'number_of_applied_billable_services', 'booking_days_to_check_in', 'booking_number_of_nights']\n"
     ]
    }
   ],
   "source": [
    "# Find numerical variables\n",
    "numerical = df.select_dtypes(include=[np.number]).columns.tolist()\n",
    "print('There are {} numerical variables\\n'.format(len(numerical)))\n",
    "print('The numerical variables are :', numerical)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "cf795d45",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Summary statistics of numerical variables:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>days_from_booking_creation_to_check_in</th>\n",
       "      <th>number_of_nights</th>\n",
       "      <th>host_age</th>\n",
       "      <th>host_months_with_truvi</th>\n",
       "      <th>number_of_listings_of_host</th>\n",
       "      <th>number_of_previous_incidents_of_host</th>\n",
       "      <th>number_of_previous_payouts_of_host</th>\n",
       "      <th>guest_age</th>\n",
       "      <th>number_of_previous_bookings_of_guest</th>\n",
       "      <th>number_of_previous_incidents_of_guest</th>\n",
       "      <th>listing_number_of_bedrooms</th>\n",
       "      <th>listing_number_of_bathrooms</th>\n",
       "      <th>previous_bookings_in_listing_count</th>\n",
       "      <th>number_of_previous_incidents_in_listing</th>\n",
       "      <th>number_of_previous_payouts_in_listing</th>\n",
       "      <th>days_to_start_verification</th>\n",
       "      <th>days_to_complete_verification</th>\n",
       "      <th>number_of_applied_services</th>\n",
       "      <th>number_of_applied_upgraded_services</th>\n",
       "      <th>number_of_applied_billable_services</th>\n",
       "      <th>booking_days_to_check_in</th>\n",
       "      <th>booking_number_of_nights</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>11677.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.0</td>\n",
       "      <td>21185.000000</td>\n",
       "      <td>21185.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>20084.000000</td>\n",
       "      <td>18500.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "      <td>21307.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>8.740038</td>\n",
       "      <td>3.876801</td>\n",
       "      <td>96.533017</td>\n",
       "      <td>5.482142</td>\n",
       "      <td>152.875815</td>\n",
       "      <td>2.718496</td>\n",
       "      <td>0.751302</td>\n",
       "      <td>42.317890</td>\n",
       "      <td>2175.999812</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.052962</td>\n",
       "      <td>1.601841</td>\n",
       "      <td>6.215094</td>\n",
       "      <td>0.123387</td>\n",
       "      <td>0.043507</td>\n",
       "      <td>0.996764</td>\n",
       "      <td>0.713135</td>\n",
       "      <td>3.721594</td>\n",
       "      <td>2.721688</td>\n",
       "      <td>1.865209</td>\n",
       "      <td>17.592247</td>\n",
       "      <td>4.144507</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>8.389242</td>\n",
       "      <td>3.335615</td>\n",
       "      <td>43.616341</td>\n",
       "      <td>2.714314</td>\n",
       "      <td>179.028829</td>\n",
       "      <td>5.582857</td>\n",
       "      <td>2.957053</td>\n",
       "      <td>13.212509</td>\n",
       "      <td>3038.837496</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.745281</td>\n",
       "      <td>1.297739</td>\n",
       "      <td>6.727896</td>\n",
       "      <td>0.537464</td>\n",
       "      <td>0.270994</td>\n",
       "      <td>3.423303</td>\n",
       "      <td>2.768474</td>\n",
       "      <td>1.553612</td>\n",
       "      <td>1.553629</td>\n",
       "      <td>0.949857</td>\n",
       "      <td>23.572901</td>\n",
       "      <td>4.799364</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-20.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>19.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>18.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>-48.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>39.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>32.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>6.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>125.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>72.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>41.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>15.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>125.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>247.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>51.000000</td>\n",
       "      <td>4302.500000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>24.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>30.000000</td>\n",
       "      <td>30.000000</td>\n",
       "      <td>125.000000</td>\n",
       "      <td>11.000000</td>\n",
       "      <td>467.000000</td>\n",
       "      <td>85.000000</td>\n",
       "      <td>62.000000</td>\n",
       "      <td>89.000000</td>\n",
       "      <td>9629.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>15.000000</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>41.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>30.000000</td>\n",
       "      <td>30.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>218.000000</td>\n",
       "      <td>116.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       days_from_booking_creation_to_check_in  number_of_nights      host_age  \\\n",
       "count                            21307.000000      21307.000000  21307.000000   \n",
       "mean                                 8.740038          3.876801     96.533017   \n",
       "std                                  8.389242          3.335615     43.616341   \n",
       "min                                -20.000000          0.000000     19.000000   \n",
       "25%                                  1.000000          2.000000     39.000000   \n",
       "50%                                  6.000000          3.000000    125.000000   \n",
       "75%                                 15.000000          4.000000    125.000000   \n",
       "max                                 30.000000         30.000000    125.000000   \n",
       "\n",
       "       host_months_with_truvi  number_of_listings_of_host  \\\n",
       "count            21307.000000                21307.000000   \n",
       "mean                 5.482142                  152.875815   \n",
       "std                  2.714314                  179.028829   \n",
       "min                  0.000000                    0.000000   \n",
       "25%                  4.000000                    9.000000   \n",
       "50%                  5.000000                   72.000000   \n",
       "75%                  8.000000                  247.000000   \n",
       "max                 11.000000                  467.000000   \n",
       "\n",
       "       number_of_previous_incidents_of_host  \\\n",
       "count                          21307.000000   \n",
       "mean                               2.718496   \n",
       "std                                5.582857   \n",
       "min                                0.000000   \n",
       "25%                                0.000000   \n",
       "50%                                1.000000   \n",
       "75%                                3.000000   \n",
       "max                               85.000000   \n",
       "\n",
       "       number_of_previous_payouts_of_host     guest_age  \\\n",
       "count                        21307.000000  11677.000000   \n",
       "mean                             0.751302     42.317890   \n",
       "std                              2.957053     13.212509   \n",
       "min                              0.000000     18.000000   \n",
       "25%                              0.000000     32.000000   \n",
       "50%                              0.000000     41.000000   \n",
       "75%                              1.000000     51.000000   \n",
       "max                             62.000000     89.000000   \n",
       "\n",
       "       number_of_previous_bookings_of_guest  \\\n",
       "count                          21307.000000   \n",
       "mean                            2175.999812   \n",
       "std                             3038.837496   \n",
       "min                                0.000000   \n",
       "25%                                0.000000   \n",
       "50%                                0.000000   \n",
       "75%                             4302.500000   \n",
       "max                             9629.000000   \n",
       "\n",
       "       number_of_previous_incidents_of_guest  listing_number_of_bedrooms  \\\n",
       "count                                21307.0                21185.000000   \n",
       "mean                                     0.0                    2.052962   \n",
       "std                                      0.0                    1.745281   \n",
       "min                                      0.0                    0.000000   \n",
       "25%                                      0.0                    1.000000   \n",
       "50%                                      0.0                    2.000000   \n",
       "75%                                      0.0                    3.000000   \n",
       "max                                      0.0                   15.000000   \n",
       "\n",
       "       listing_number_of_bathrooms  previous_bookings_in_listing_count  \\\n",
       "count                 21185.000000                        21307.000000   \n",
       "mean                      1.601841                            6.215094   \n",
       "std                       1.297739                            6.727896   \n",
       "min                       0.000000                            0.000000   \n",
       "25%                       1.000000                            1.000000   \n",
       "50%                       1.000000                            4.000000   \n",
       "75%                       2.000000                            9.000000   \n",
       "max                      17.000000                           41.000000   \n",
       "\n",
       "       number_of_previous_incidents_in_listing  \\\n",
       "count                             21307.000000   \n",
       "mean                                  0.123387   \n",
       "std                                   0.537464   \n",
       "min                                   0.000000   \n",
       "25%                                   0.000000   \n",
       "50%                                   0.000000   \n",
       "75%                                   0.000000   \n",
       "max                                   9.000000   \n",
       "\n",
       "       number_of_previous_payouts_in_listing  days_to_start_verification  \\\n",
       "count                           21307.000000                20084.000000   \n",
       "mean                                0.043507                    0.996764   \n",
       "std                                 0.270994                    3.423303   \n",
       "min                                 0.000000                    0.000000   \n",
       "25%                                 0.000000                    0.000000   \n",
       "50%                                 0.000000                    0.000000   \n",
       "75%                                 0.000000                    0.000000   \n",
       "max                                 6.000000                   30.000000   \n",
       "\n",
       "       days_to_complete_verification  number_of_applied_services  \\\n",
       "count                   18500.000000                21307.000000   \n",
       "mean                        0.713135                    3.721594   \n",
       "std                         2.768474                    1.553612   \n",
       "min                         0.000000                    2.000000   \n",
       "25%                         0.000000                    2.000000   \n",
       "50%                         0.000000                    4.000000   \n",
       "75%                         0.000000                    5.000000   \n",
       "max                        30.000000                    8.000000   \n",
       "\n",
       "       number_of_applied_upgraded_services  \\\n",
       "count                         21307.000000   \n",
       "mean                              2.721688   \n",
       "std                               1.553629   \n",
       "min                               1.000000   \n",
       "25%                               1.000000   \n",
       "50%                               3.000000   \n",
       "75%                               4.000000   \n",
       "max                               7.000000   \n",
       "\n",
       "       number_of_applied_billable_services  booking_days_to_check_in  \\\n",
       "count                         21307.000000              21307.000000   \n",
       "mean                              1.865209                 17.592247   \n",
       "std                               0.949857                 23.572901   \n",
       "min                               0.000000                -48.000000   \n",
       "25%                               1.000000                  2.000000   \n",
       "50%                               2.000000                  8.000000   \n",
       "75%                               3.000000                 24.000000   \n",
       "max                               5.000000                218.000000   \n",
       "\n",
       "       booking_number_of_nights  \n",
       "count              21307.000000  \n",
       "mean                   4.144507  \n",
       "std                    4.799364  \n",
       "min                    0.000000  \n",
       "25%                    2.000000  \n",
       "50%                    3.000000  \n",
       "75%                    5.000000  \n",
       "max                  116.000000  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# View summary statistics of numerical variables\n",
    "print(\"\\nSummary statistics of numerical variables:\")\n",
    "df[numerical].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "2cf714c9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAABdoAAAx2CAYAAAAYNEt4AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XlcVdX+//E3yigICApIKnLVVBQ1savkkClJipZJg7MpZhpaDqnXMnOovFnOWuStxApvabdRTUVNzURTknLKrEwsBcMBwgFQ9u+PfuyvR5BBOEy+no/HedRZ67PXXuuc41l7f9hnbRvDMAwBAAAAAAAAAICbUqWsOwAAAAAAAAAAQEVGoh0AAAAAAAAAgGIg0Q4AAAAAAAAAQDGQaAcAAAAAAAAAoBhItAMAAAAAAAAAUAwk2gEAAAAAAAAAKAYS7QAAAAAAAAAAFAOJdgAAAAAAAAAAioFEOwAAAAAAAAAAxUCiXdL06dNlY2NTKvvq3LmzOnfubD7funWrbGxs9NFHH5XK/h977DHVr1+/VPZ1s9LT0zV8+HD5+PjIxsZGY8eOLXIbNjY2mj59eon3raQdPXpU3bp1k5ubm2xsbPTpp5+WdZcKpbQ/t9fK+feakpKSb1xF+KyXJ7/99ptsbGwUHR1d1l0pFwr7OStpjz32mFxcXIrdTnR0tGxsbPTbb78Vv1O4pXGMVL6UxDFSWercubOaN29e1t24KevXr1erVq3k6OgoGxsbnT9/vkTbv/7zX9RtK+rrCqBy4bihfCnKcYONjY1Gjx5dep0DKqlKl2jPSS7kPBwdHeXr66vQ0FAtWrRIf/31V4ns5+TJk5o+fboSEhJKpL2SVJ77Vhgvv/yyoqOjNWrUKL333nsaNGhQWXfJaoYMGaL9+/frpZde0nvvvac2bdqUdZdQya1cuVILFiwo627c0Lp16yrEH8mAiohjpPLdt8K4lY6RypMzZ87okUcekZOTk5YuXar33ntPzs7OZd2tIqvon38ApYvjhvLdt8Io78cNnPuhMrIt6w5Yy8yZM+Xv76+srCwlJSVp69atGjt2rObNm6fPP/9cLVq0MGOnTp2qf/3rX0Vq/+TJk5oxY4bq16+vVq1aFXq7jRs3Fmk/NyO/vv3nP/9Rdna21ftQHFu2bFG7du30wgsvlHVXrOrSpUuKi4vTc889x1+OraAifNbLwsqVK3XgwIFcVzP4+fnp0qVLsrOzK5uO/X/r1q3T0qVLOeAqpkGDBqlv375ycHAo666gHOIYiWMkFM2ePXv0119/adasWQoJCbHKPsr68w8AN8JxA8cN1sK5HyqjSpto7969u8XVwVOmTNGWLVvUs2dP3X///Tp8+LCcnJwkSba2trK1te5LcfHiRVWrVk329vZW3U9ByjqJVhinT59WQEBAWXfD6v78809Jkru7e4GxFy5cqJBXTpWlivBZz09pv+c5V6mgcqhataqqVq1a1t1AOcUxUt4qwrxxqxwjFUd2drYyMzNLdE47ffq0pMIds92ssv78A8CNcNyQN44bAOSl0i0dk58uXbro+eef1/Hjx/X++++b5XmtIxYbG6sOHTrI3d1dLi4uaty4sZ599llJf6/9deedd0qShg4dav6UKmdt45x1EuPj49WpUydVq1bN3PZG6y9evXpVzz77rHx8fOTs7Kz7779fJ06csIipX7++HnvssVzbXttmQX3Lax2xCxcuaMKECapbt64cHBzUuHFjvfbaazIMwyIuZ82uTz/9VM2bN5eDg4OaNWum9evX5/2CX+f06dOKiIiQt7e3HB0d1bJlS61YscKsz1lT7dixY1q7dq3Z9/zWGM7IyNC4ceNUq1YtVa9eXffff79+//33XHHHjx/Xk08+qcaNG8vJyUmenp56+OGHLdr+9ddfZWNjo/nz5+fafufOnbKxsdF///tfSdJff/2lsWPHqn79+nJwcJCXl5fuvfdefffdd4V6LaZPny4/Pz9J0sSJE2VjY2O+Lzmfx0OHDql///6qUaOGOnToIEm6cuWKZs2apQYNGsjBwUH169fXs88+q4yMDIv269evr549e2rr1q1q06aNnJycFBgYqK1bt0qSPv74YwUGBsrR0VFBQUHat29fofp9vcJ8biVp9erVCgoKkpOTk2rWrKmBAwfqjz/+yBW3ZcsWdezYUc7OznJ3d9cDDzygw4cPF9iP48ePq2HDhmrevLmSk5Ml5f6s56xB/tprr2nZsmXma3jnnXdqz549efY5ICBAjo6Oat68uT755JObXodv9+7d6tGjh2rUqCFnZ2e1aNFCCxcuNOtz1uX+5Zdf1KNHD1WvXl0DBgyQ9HfCYMGCBWrWrJkcHR3l7e2tJ554QufOnbPYx2effaawsDD5+vrKwcFBDRo00KxZs3T16lUzpnPnzlq7dq2OHz9u/vvKGc+N1mgvzHuS85n9+eef9dhjj8nd3V1ubm4aOnSoLl68WOjX6bHHHtPSpUslyeJnqjkK+11VGD/++KMeeeQR1apVS05OTmrcuLGee+65XHHnz58v1Jjef/998zPu4eGhvn375vlvoaDPQl4SEhJUq1Ytde7cWenp6YUaX15rtOd8L+zYsUP//Oc/5ejoqH/84x969913C9UmKjeOkSrfMVJh+3SjuS2v9z6nzZw50snJScHBwdq/f78k6c0331TDhg3l6Oiozp0737B/8fHxuuuuu+Tk5CR/f39FRUXlisnIyNALL7yghg0bysHBQXXr1tWkSZNyHfPk9CkmJkbNmjWTg4NDoV93qeBjlM6dO2vIkCGSpDvvvFM2NjZ5ftbykvNd/M0332j8+PGqVauWnJ2d9eCDD5oXXFy7n+s//8ePH9f9998vZ2dneXl5ady4cdqwYYNsbGzMY7prHTp0SPfcc4+qVaum2267TXPmzDHrCvr8Hz16VOHh4fLx8ZGjo6Pq1Kmjvn37KjU1tVBjBXBr4bih8h035ChMn/bt26fu3bvL1dVVLi4u6tq1q3bt2mURk5WVpRkzZqhRo0ZydHSUp6enOnTooNjYWPP1y+/cryBff/21Hn74YdWrV888Thg3bpwuXbqUK7aw5/aFPfcG8lNpr2i/kUGDBunZZ5/Vxo0b9fjjj+cZc/DgQfXs2VMtWrTQzJkz5eDgoJ9//lnffPONJKlp06aaOXOmpk2bphEjRqhjx46SpLvuusts48yZM+revbv69u2rgQMHytvbO99+vfTSS7KxsdHkyZN1+vRpLViwQCEhIUpISDD/OlwYhenbtQzD0P3336+vvvpKERERatWqlTZs2KCJEyfqjz/+yJV03rFjhz7++GM9+eSTql69uhYtWqTw8HAlJibK09Pzhv26dOmSOnfurJ9//lmjR4+Wv7+/Vq9erccee0znz5/X008/raZNm+q9997TuHHjVKdOHU2YMEGSVKtWrRu2O3z4cL3//vvq37+/7rrrLm3ZskVhYWG54vbs2aOdO3eqb9++qlOnjn777Te98cYb6ty5sw4dOqRq1arpH//4h9q3b6+YmBiNGzfOYvuYmBhVr15dDzzwgCRp5MiR+uijjzR69GgFBATozJkz2rFjhw4fPqzWrVvfsL85+vTpI3d3d40bN079+vVTjx49ct0A8eGHH1ajRo308ssvmxPz8OHDtWLFCj300EOaMGGCdu/erdmzZ+vw4cP65JNPLLb/+eef1b9/fz3xxBMaOHCgXnvtNfXq1UtRUVF69tln9eSTT0qSZs+erUceeURHjhxRlSpF+9tbYT630dHRGjp0qO68807Nnj1bycnJWrhwob755hvt27fPvDps06ZN6t69u/7xj39o+vTpunTpkhYvXqz27dvru+++u2GC+5dfflGXLl3k4eGh2NhY1axZM98+r1y5Un/99ZeeeOIJ2djYaM6cOerTp49+/fVX86qEtWvX6tFHH1VgYKBmz56tc+fOKSIiQrfddluRXh/p7wPLnj17qnbt2nr66afl4+Ojw4cPa82aNXr66afNuCtXrig0NFQdOnTQa6+9pmrVqkmSnnjiCfM1fOqpp3Ts2DEtWbJE+/bt0zfffGP2OTo6Wi4
      "text/plain": [
       "<Figure size 1500x3200 with 22 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Select numeric columns\n",
    "numerical = df.select_dtypes(include='number').columns\n",
    "n_cols = 3\n",
    "n_rows = math.ceil(len(numerical) / n_cols)\n",
    "\n",
    "# Create subplots\n",
    "fig, axes = plt.subplots(n_rows, n_cols, figsize=(5 * n_cols, 4 * n_rows))\n",
    "axes = axes.flatten()\n",
    "\n",
    "# Plot each numeric column\n",
    "for i, col in enumerate(numerical):\n",
    "    axes[i].hist(df[col].dropna(), bins=30, edgecolor='black')\n",
    "    axes[i].set_title(f'Distribution of {col}')\n",
    "    axes[i].set_xlabel(col)\n",
    "    axes[i].set_ylabel('Frequency')\n",
    "\n",
    "# Hide any unused subplots\n",
    "for j in range(i + 1, len(axes)):\n",
    "    fig.delaxes(axes[j])\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "311da64d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We see that there are some outliers in host_age with ages above 100, we will remove those\n",
    "df['host_age'] = df['host_age'].where(df['host_age'] <= 100, np.nan)\n",
    "\n",
    "# We drop number_of_previous_incidents_of_guest as it has only 0 values\n",
    "df.drop(columns=['number_of_previous_incidents_of_guest'], inplace=True)\n",
    "numerical = df.select_dtypes(include='number').columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "692854bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Missing Values (%):\n",
      "host_age                         69.826817\n",
      "guest_age                        45.196414\n",
      "days_to_complete_verification    13.174074\n",
      "days_to_start_verification        5.739898\n",
      "listing_number_of_bathrooms       0.572582\n",
      "listing_number_of_bedrooms        0.572582\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Check missing values for the remaining columns\n",
    "missing_values = df.isnull().mean() * 100\n",
    "missing_values = missing_values[missing_values > 0].sort_values(ascending=False)\n",
    "print(\"Missing Values (%):\")\n",
    "print(missing_values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "9f333fd5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We will fill the remaining missing values with the median for numerical columns\n",
    "for col in numerical:\n",
    "    df[col] = df[col].fillna(df[col].median())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "ccd46ddc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert all boolean columns to int\n",
    "bool_columns = df.select_dtypes(include='bool').columns\n",
    "for col in bool_columns:\n",
    "    df[col] = df[col].astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c84ebe5",
   "metadata": {},
   "source": [
    "### Feature Relevance Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "74a582c8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAABQgAAASPCAYAAABCohK6AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Xtcjvf/B/DXdd/V3d1Zikrno0KKHMNytjluztoSNswwxLDNiiFMMowxP9U2xmaYzZjDhDXDUs5CIYcmp9L5cN/X74++3evW6XqnHOb9fDzux6Ou6319rs91vu/P9TkIoiiKYIwxxhhjjDHGGGOMvZRkzzoDjDHGGGOMMcYYY4yxZ4cLCBljjDHGGGOMMcYYe4lxASFjjDHGGGOMMcYYYy8xLiBkjDHGGGOMMcYYY+wlxgWEjDHGGGOMMcYYY4y9xLiAkDHGGGOMMcYYY4yxlxgXEDLGGGOMMcYYY4wx9hLjAkLGGGOMMcYYY4wxxl5iXEDIGGOMMcYYY4wxxthLjAsIGWOM/Sfs27cPo0aNgru7O0xMTKBQKGBtbY3u3bsjMjISd+/efdZZfGJhYWEQBAFhYWFPbZ2Ojo4QBAHXrl17auukCggIgCAIEAQB/fv3rzL2hx9+0MQKgoCbN28+pVxKU5qvp0WtVsPPzw9WVlbIycnRygPlExAQ8NTyzEoEBwdDEARER0dLXiY6OhqCIMDR0bHO8gU8/XtVTfbF41QqFRo3bgwHBwfk5eXVXuYYY4yxF4TOs84AY4wx9iTu3buH4cOHY//+/QBKCrQ6d+4MQ0ND/PPPP/jzzz+xf/9+fPLJJ9i/fz/atGnzjHP8/AgODkZMTAyioqIQHBz8rLNTK3799VfcuXMHDRs2rHD+//3f/9XJeksL9URRrJP068r//d//IT4+HqtWrYKhoSEAYOTIkeXi/vnnH/z222+Vzm/cuHHdZvQ59qIee6ZNLpdj/vz5GDx4MJYsWYLQ0NBnnSXGGGPsqeICQsYYYy+szMxMdOjQAUlJSWjcuDHWrVuHjh07asUUFBQgJiYGoaGhSEtLe0Y5fXEdOHAARUVFaNSo0bPOSrX8/Pzw999/4+uvv8aMGTPKzb9x4wb27duHVq1a4cSJE88gh9W7cOHCU1tXXl4ePvroI9jY2GDs2LGa6RXVwoqNjdUUED5JLS3GnmeDBg1Cs2bNsHjxYowbNw5WVlbPOkuMMcbYU8NNjBljjL2wJk2ahKSkJDg6OiIuLq5c4SAAKBQKjB07FomJifD09HwGuXyxubi4oHHjxtDV1X3WWanWm2++CT09PURFRVU4Pzo6Gmq1GqNHj37KOZOucePGT6023rfffou7d+8iKCjohTi+jD0No0ePRl5eHtatW/ess8IYY4w9VVxAyBhj7IWUkpKCTZs2AQCWLVsGc3PzKuMbNmwIDw+PctM3b96Mrl27wtzcHAqFAg4ODhg9ejQuXbpUYTpl++T76aef0KVLF5ibm0MQBMTGxgLQ7kcuKioK7dq1g6mpabm+/G7fvo1p06bB09MTBgYGMDY2RqtWrbBq1SoUFxdL3hdFRUX49ttvERgYiMaNG8PExARKpRIeHh6YPHkybt++rRV/7do1CIKAmJgYAMCoUaO0+pMr229YVX0Q5ubmYtGiRWjRogWMjY1hYGCAJk2a4OOPP8bDhw/LxZeu19HREaIoYt26dWjZsiUMDQ1hamqKHj164OjRo5K3+3H169dHv379cOHChXLpiKKI6OhoKJVKDB8+vNI0rl+/jsWLF6NLly6wt7eHQqGAmZkZOnTogLVr10KtVmvFl/a1VurxvvlK91tp32/BwcF48OABpkyZAhcXFygUCq3++yrqgzAiIgKCIMDd3R1ZWVnl8vzVV19BEATY2dnh3r17UncXVq1aBQBP3Ly8bH9zqampGDNmDOzs7KCrq6tJu+z2V6TsuVHZ9JqcM7m5uVi+fDk6dOiAevXqaa7xvn37au4fperq2Je6dOkSxo0bBxcXF+jr68PU1BSdOnXCt99+W2n+S88VBwcHKBQK2NvbY+LEiXjw4EGly9S2/fv3Y9KkSfDx8YGFhQUUCgVsbW0xdOhQSTVxr1+/jqCgIFhbW0NfXx/u7u4ICwursp+/muyriqjVaqxbtw7+/v4wMzODrq4uGjRogObNm2PSpEkV3tcCAwOho6ODtWvXku7DjDHG2IuOmxgzxhh7If3yyy9QqVQwMzNDv379yMuLoojg4GB8/fXX0NHRQadOndCgQQOcPHkSUVFR2LJlC3788Uf06tWrwuUjIiKwatUq+Pn5oVevXrh9+zbkcrlWzKRJk7B69Wq0b98evXv3RkpKiqZA4fDhwxgwYAAePnwIR0dHdO/eHQUFBTh+/DgmTZqEn3/+Gb/88oukml137tzBW2+9BVNTU3h6esLb2xs5OTlITEzEypUrsXnzZvz5559wdXUFABgZGWHkyJH4448/kJycDH9/f808APDx8al2nQ8ePEDXrl2RmJgIExMTdOnSBbq6ujh06BAWLFiATZs24ffff690MIRRo0Zh06ZN6NixI/r06YPExETs27cPhw8fxqFDh2rcV+To0aOxdetWbNiwAe3atdNMP3jwIFJSUhAYGAhTU9NKl//mm28wZ84cODk5wd3dHf7+/khLS8PRo0cRFxeHvXv3YuvWrZrj6OPjg5EjR2oKWx/vn8/IyEjr/3v37sHPzw8ZGRno2LEjWrZsCT09vSq3KSQkBIcPH8bOnTsxduxYfPfdd5p5p06dwuTJk6Gjo4MtW7bAwsJC0n66evUqTp8+DVtb2woLzmvi8uXL8PX1hZ6eHvz9/SGKouT8SEE9Z27cuIFevXrh/PnzMDAwgL+/P+rXr49bt27hyJEjOHPmDEaMGKGJr8tj/8MPPyAoKAj5+flo3LgxXnvtNWRmZuLYsWN466238Pvvv2PDhg1ay9+5cwcdO3bE5cuXUa9ePfTp0wdqtRobN27Enj170KRJk1rbt1UZP348bty4gSZNmsDf3x86Ojq4ePEivv/+e2zbtg2bN2/GwIEDK1z26tWraNmypeYem5eXh4MHD2Lu3LnYv38/9u/fD319fa1larKvKvP2228jKioK+vr66NChAywtLfHgwQOkpKRg1apV6Nq1a7l7lKWlJXx8fPD333/jxIkTWvcRxhhj7D9NZIwxxl5Ab731lghA7NKlS42WX7NmjQhAtLCwEBMSEjTT1Wq1GBoaKgIQzczMxPT0dK3lHBwcRACiXC4Xf/rppwrTBiACEE1MTMSjR4+Wm5+WlibWr19fFARBXL16tahSqTTz7t27J3bp0kUEIM6dO1drudJ8hYaGak1/9OiR+NNPP4kFBQVa0wsLC8XZs2eLAMTXXnutXD5GjhwpAhCjoqIq3I6y23v16lWt6UOHDhUBiG3atBHv3bunmZ6VlSW++uqrIgCxffv2WstcvXpVs28cHBzEpKQkzbzi4mJx9OjRIgCxR48eleanIq+88ooIQPzmm29ElUol2traisbGxmJOTo4mJjAwUAQg/v7776Io/nuMbty4oZXW8ePHxTNnzpRbx61bt8TmzZuLAMTvv/++3PzS9CoTFRWlienatauYmZlZYVxl6Tx8+FB0dHQUAYhr1qwRRbHkuLu5uYkAxM8++6zSdVdk/fr1IgBx8ODBkuIPHjxYad5Kz0sA4ptvvinm5+eXiynd/pEjR1aYfum54eDgUOF06jmjUqlEPz8/zbzHr+O8vDxx165dWtPq6tifPn1aVCgUor6+vvjjjz9qzbt27ZrYrFkzEYAYExOjNW/QoEEiALFjx45iRkaGZvr9+/fFNm3aaNZb1fX7uNLj8Ph+rsr27dvFBw8eVDhdR0dHrF+/vpibm6s1r+w50b9/f635N27cEN3d3UUA4qxZs7SWq+m+quhedv36dRGAaGtrK6alpZXL//nz58Xr169XuM2TJ08WAYiffvppxTuFMcYY+w/iAkLGGGMvpF69eokAxGHDhtVoeRcXFxGAuGLFinLz1Gq16O3tLQIQFyxYoDWvtMBs9OjRlaZd+sN43rx5Fc6fOXOmCEC
      "text/plain": [
       "<Figure size 1400x1200 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Check correlation matrix\n",
    "import seaborn as sns\n",
    "\n",
    "# 1. Move 'has_resolution_incident' to the end\n",
    "target_col = 'has_resolution_incident'\n",
    "if target_col in df.columns:\n",
    "    columns = [col for col in df.columns if col != target_col] + [target_col]\n",
    "    df = df[columns]\n",
    "\n",
    "# 2. Create short column names (truncate to, say, 15 chars)\n",
    "short_columns = [col[:15] for col in df.columns]\n",
    "\n",
    "# 3. Compute correlation matrix\n",
    "correlation_matrix = df.corr()\n",
    "\n",
    "# 4. Plot with Seaborn\n",
    "plt.figure(figsize=(14, 12))\n",
    "sns.heatmap(\n",
    "    correlation_matrix,\n",
    "    xticklabels=short_columns,\n",
    "    yticklabels=short_columns,\n",
    "    cmap='coolwarm',\n",
    "    annot=False,\n",
    "    fmt=\".2f\",\n",
    "    square=True,\n",
    "    cbar_kws={'shrink': 0.6}\n",
    ")\n",
    "plt.title('Correlation Matrix (Truncated Labels)', fontsize=16)\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "a6f7988d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "number_of_previous_incidents_in_listing                0.101702\n",
      "number_of_previous_payouts_in_listing                  0.096180\n",
      "host_account_type_Host                                 0.073745\n",
      "number_of_listings_of_host                             0.070200\n",
      "listing_number_of_bedrooms                             0.065542\n",
      "listing_country_United States                          0.062555\n",
      "host_active_pms_list_Hostify                           0.060898\n",
      "host_country_United States                             0.055897\n",
      "has_deposit_management_service_business_type           0.055543\n",
      "host_country_United Kingdom                            0.049846\n",
      "listing_country_United Kingdom                         0.048641\n",
      "guest_country_United States                            0.047742\n",
      "number_of_applied_billable_services                    0.045234\n",
      "host_account_type_PMC - Property Management Company    0.044632\n",
      "has_completed_verification                             0.040583\n",
      "guest_age                                              0.039814\n",
      "is_guest_from_listing_country                          0.038440\n",
      "listing_number_of_bathrooms                            0.038292\n",
      "listing_country_New Zealand                            0.036971\n",
      "is_guest_from_listing_town                             0.036880\n",
      "host_country_New Zealand                               0.036791\n",
      "previous_bookings_in_listing_count                     0.035117\n",
      "guest_has_email                                        0.033928\n",
      "number_of_applied_services                             0.032459\n",
      "number_of_applied_upgraded_services                    0.032452\n",
      "guest_country_New Zealand                              0.031652\n",
      "guest_country_Other                                    0.031166\n",
      "guest_has_phone_number                                 0.030379\n",
      "booking_number_of_nights                               0.026738\n",
      "has_guest_previously_booked_same_listing               0.026621\n",
      "host_active_pms_list_Hospitable                        0.025430\n",
      "host_active_pms_list_Hostfully                         0.025058\n",
      "number_of_previous_bookings_of_guest                   0.024027\n",
      "number_of_nights                                       0.023304\n",
      "guest_country_Canada                                   0.022773\n",
      "host_active_pms_list_Hostaway                          0.021299\n",
      "booking_days_to_check_in                               0.020963\n",
      "host_country_Canada                                    0.020417\n",
      "has_upgraded_screening_service_business_type           0.020254\n",
      "has_verification_request                               0.019356\n",
      "listing_country_Colombia                               0.018607\n",
      "listing_country_Canada                                 0.018591\n",
      "is_host_from_listing_country                           0.018029\n",
      "number_of_previous_incidents_of_host                   0.017803\n",
      "number_of_previous_payouts_of_host                     0.017717\n",
      "days_from_booking_creation_to_check_in                 0.016637\n",
      "host_active_pms_list_OwnerRez                          0.015977\n",
      "is_host_from_listing_town                              0.015359\n",
      "is_host_from_listing_postcode                          0.014238\n",
      "host_active_pms_list_Avantio                           0.011872\n",
      "host_active_pms_list_Lodgify                           0.010976\n",
      "guest_country_Australia                                0.009813\n",
      "listing_country_Ireland                                0.009753\n",
      "listing_country_Mexico                                 0.009473\n",
      "guest_country_Colombia                                 0.009243\n",
      "is_guest_from_listing_postcode                         0.009204\n",
      "host_active_pms_list_TrackHs                           0.008961\n",
      "has_protection_service_business_type                   0.008933\n",
      "guest_country_Mexico                                   0.008703\n",
      "host_country_Mexico                                    0.008603\n",
      "listing_country_Bahamas                                0.008572\n",
      "host_country_Sweden                                    0.008302\n",
      "host_country_Bulgaria                                  0.008129\n",
      "guest_country_Germany                                  0.007512\n",
      "guest_country_United Kingdom                           0.007411\n",
      "host_months_with_truvi                                 0.007277\n",
      "host_active_pms_list_Guesty                            0.007083\n",
      "host_country_Portugal                                  0.007005\n",
      "guest_country_Ireland                                  0.006589\n",
      "host_active_pms_list_Uplisting                         0.005862\n",
      "guest_country_France                                   0.005616\n",
      "host_country_Other                                     0.004820\n",
      "has_billable_services                                  0.004251\n",
      "listing_country_Other                                  0.003930\n",
      "days_to_start_verification                             0.003879\n",
      "listing_country_Virgin Islands, U.s.                   0.002997\n",
      "host_age                                               0.002981\n",
      "host_active_pms_list_Hospitable Connect                0.002904\n",
      "host_active_pms_list_Smoobu                            0.002597\n",
      "host_country_Norway                                    0.002255\n",
      "host_country_Australia                                 0.001435\n",
      "listing_country_Australia                              0.001435\n",
      "days_to_complete_verification                          0.001179\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Compute correlation with the target variable\n",
    "correlation_with_target = df.corrwith(df['has_resolution_incident'])\n",
    "\n",
    "# Drop the target itself (its correlation with itself is always 1)\n",
    "correlation_with_target = correlation_with_target.drop(labels='has_resolution_incident')\n",
    "\n",
    "# Sort by absolute correlation, descending\n",
    "correlation_sorted = correlation_with_target.abs().sort_values(ascending=False)\n",
    "\n",
    "# Print the sorted correlations (you can keep the original signs too if preferred)\n",
    "print(correlation_sorted)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2caec836",
   "metadata": {},
   "source": [
    "### Weighted classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "e6d091fb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: np.float64(1.0119419188492333), 1: np.float64(84.73863636363637)}"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We will use weight classes due to the inbalance of the target variable\n",
    "X = df.drop(columns=['has_resolution_incident'])\n",
    "y = df['has_resolution_incident']\n",
    "\n",
    "# 1. Split data into training and testing sets\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, test_size=0.3, random_state=123, stratify=y\n",
    ")\n",
    "\n",
    "# Compute label distribution on the training set\n",
    "label_distribution = y_train.value_counts(normalize=True)\n",
    "\n",
    "# Calculate inverse weights\n",
    "weights = {\n",
    "    0: 1 / label_distribution[0],\n",
    "    1: 1 / label_distribution[1]\n",
    "}\n",
    "weights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab8f7646",
   "metadata": {},
   "source": [
    "### Feature Selection\n",
    "\n",
    "Since we have many columns, we’ll apply feature selection techniques like KBest, RFE (Recursive Feature Elimination), and Lasso (L1 regularization), to reduce the number of fields used in our predictive model. This helps:\n",
    "- Avoid overfitting\n",
    "- Potentially improve model performance (simpler models often generalize better)\n",
    "- Reduce training time\n",
    "\n",
    "We'll also experiment with different numbers of features to determine which combination produces the model best suited to our objectives."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "0246eb6c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selected Features:\n",
      "Index(['number_of_nights', 'number_of_listings_of_host', 'guest_age',\n",
      "       'has_guest_previously_booked_same_listing',\n",
      "       'listing_number_of_bedrooms', 'listing_number_of_bathrooms',\n",
      "       'previous_bookings_in_listing_count',\n",
      "       'number_of_previous_incidents_in_listing',\n",
      "       'number_of_previous_payouts_in_listing', 'guest_has_email',\n",
      "       'is_guest_from_listing_town', 'is_guest_from_listing_country',\n",
      "       'has_completed_verification', 'number_of_applied_services',\n",
      "       'number_of_applied_upgraded_services',\n",
      "       'number_of_applied_billable_services', 'booking_number_of_nights',\n",
      "       'has_deposit_management_service_business_type',\n",
      "       'host_account_type_Host',\n",
      "       'host_account_type_PMC - Property Management Company',\n",
      "       'host_active_pms_list_Hospitable', 'host_active_pms_list_Hostify',\n",
      "       'host_country_New Zealand', 'host_country_United Kingdom',\n",
      "       'host_country_United States', 'guest_country_Canada',\n",
      "       'guest_country_United States', 'listing_country_New Zealand',\n",
      "       'listing_country_United Kingdom', 'listing_country_United States'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "selector = SelectKBest(score_func=f_classif, k=30)\n",
    "X_new = selector.fit_transform(X_train, y_train)\n",
    "selected_features_kbest = X_train.columns[selector.get_support()]\n",
    "\n",
    "print(\"Selected Features:\")\n",
    "print(selected_features_kbest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "736a8d68",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/joaquin/data-jupyter-notebooks/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n",
      "/home/joaquin/data-jupyter-notebooks/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n",
      "/home/joaquin/data-jupyter-notebooks/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n",
      "/home/joaquin/data-jupyter-notebooks/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n",
      "/home/joaquin/data-jupyter-notebooks/.venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  n_iter_i = _check_optimize_result(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selected Features using RFE:\n",
      "Index(['has_guest_previously_booked_same_listing',\n",
      "       'number_of_previous_payouts_in_listing', 'guest_has_email',\n",
      "       'is_guest_from_listing_town', 'is_guest_from_listing_country',\n",
      "       'is_host_from_listing_country', 'is_host_from_listing_postcode',\n",
      "       'has_completed_verification', 'has_verification_request',\n",
      "       'has_upgraded_screening_service_business_type',\n",
      "       'has_deposit_management_service_business_type',\n",
      "       'host_account_type_Host',\n",
      "       'host_account_type_PMC - Property Management Company',\n",
      "       'host_active_pms_list_Avantio', 'host_active_pms_list_Hostify',\n",
      "       'host_active_pms_list_TrackHs', 'host_country_Bulgaria',\n",
      "       'host_country_Canada', 'host_country_New Zealand',\n",
      "       'guest_country_Australia', 'guest_country_Canada',\n",
      "       'guest_country_Germany', 'guest_country_Mexico', 'guest_country_Other',\n",
      "       'listing_country_Bahamas', 'listing_country_Canada',\n",
      "       'listing_country_Colombia', 'listing_country_Ireland',\n",
      "       'listing_country_New Zealand', 'listing_country_United States'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "# Recursive Feature Elimination (RFE) with Logistic Regression\n",
    "model = LogisticRegression(max_iter=1000)\n",
    "rfe = RFE(model, n_features_to_select=30)\n",
    "rfe.fit(X_train, y_train)\n",
    "selected_features_rfe = X_train.columns[rfe.support_]\n",
    "\n",
    "print(\"Selected Features using RFE:\")\n",
    "print(selected_features_rfe)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "484786aa",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selected Features using Lasso Regression:\n",
      "Index(['days_from_booking_creation_to_check_in', 'number_of_nights',\n",
      "       'host_age', 'host_months_with_truvi', 'number_of_listings_of_host',\n",
      "       'number_of_previous_incidents_of_host',\n",
      "       'number_of_previous_payouts_of_host', 'guest_age',\n",
      "       'number_of_previous_bookings_of_guest',\n",
      "       'has_guest_previously_booked_same_listing',\n",
      "       'listing_number_of_bedrooms', 'listing_number_of_bathrooms',\n",
      "       'previous_bookings_in_listing_count',\n",
      "       'number_of_previous_incidents_in_listing',\n",
      "       'number_of_previous_payouts_in_listing', 'days_to_start_verification',\n",
      "       'days_to_complete_verification', 'is_guest_from_listing_town',\n",
      "       'is_guest_from_listing_country', 'is_host_from_listing_town',\n",
      "       'is_host_from_listing_postcode', 'has_completed_verification',\n",
      "       'number_of_applied_services', 'number_of_applied_billable_services',\n",
      "       'booking_days_to_check_in', 'booking_number_of_nights',\n",
      "       'has_verification_request',\n",
      "       'has_upgraded_screening_service_business_type',\n",
      "       'has_deposit_management_service_business_type',\n",
      "       'has_protection_service_business_type', 'host_account_type_Host',\n",
      "       'host_account_type_PMC - Property Management Company',\n",
      "       'host_active_pms_list_Guesty', 'host_active_pms_list_Hospitable',\n",
      "       'host_active_pms_list_Hostaway', 'host_active_pms_list_Hostfully',\n",
      "       'host_active_pms_list_Hostify', 'host_active_pms_list_Lodgify',\n",
      "       'host_active_pms_list_OwnerRez', 'host_country_New Zealand',\n",
      "       'guest_country_Canada', 'guest_country_Other',\n",
      "       'guest_country_United Kingdom', 'guest_country_United States',\n",
      "       'listing_country_Colombia', 'listing_country_New Zealand',\n",
      "       'listing_country_United States'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "# Lasso Regression for feature selection\n",
    "model = LogisticRegression(penalty='l1', solver='liblinear')\n",
    "model.fit(X_train, y_train)\n",
    "\n",
    "# Check which features have non-zero coefficients\n",
    "selected_features_lasso = X_train.columns[model.coef_[0] != 0]\n",
    "print(\"Selected Features using Lasso Regression:\")\n",
    "print(selected_features_lasso)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04010a1e",
   "metadata": {},
   "source": [
    "## Processing\n",
    "Processing in this notebook is quite straight-forward: we just drop id booking, split the features and target and apply a scaling to numeric features.\n",
    "Afterwards, we split the dataset between train and test and display their sizes and target distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "f735b111",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training set size: 14914 rows\n",
      "Test set size: 6393 rows\n",
      "\n",
      "Training target distribution:\n",
      "has_resolution_incident\n",
      "0    0.988199\n",
      "1    0.011801\n",
      "Name: proportion, dtype: float64\n",
      "\n",
      "Test target distribution:\n",
      "has_resolution_incident\n",
      "0    0.988112\n",
      "1    0.011888\n",
      "Name: proportion, dtype: float64\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_48568/2398832410.py:8: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  X_train_kbest[selected_features_kbest] = X_train_kbest[selected_features_kbest].astype(float)\n"
     ]
    }
   ],
   "source": [
    "# Separate features and target\n",
    "X_train_kbest = X_train[selected_features_kbest]  # Use the features selected by SelectKBest\n",
    "y_train_kbest = y_train\n",
    "X_test_kbest = X_test[selected_features_kbest]\n",
    "y_test_kbest = y_test\n",
    "\n",
    "# Scale numeric features\n",
    "X_train_kbest[selected_features_kbest] = X_train_kbest[selected_features_kbest].astype(float)\n",
    "\n",
    "print(f\"Training set size: {X_train_kbest.shape[0]} rows\")\n",
    "print(f\"Test set size: {X_test_kbest.shape[0]} rows\")\n",
    "\n",
    "print(\"\\nTraining target distribution:\")\n",
    "print(y_train_kbest.value_counts(normalize=True))\n",
    "\n",
    "print(\"\\nTest target distribution:\")\n",
    "print(y_test_kbest.value_counts(normalize=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "897eb678",
   "metadata": {},
   "source": [
    "### Using RFE Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "301a8fb2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training set size: 14914 rows\n",
      "Test set size: 6393 rows\n",
      "\n",
      "Training target distribution:\n",
      "has_resolution_incident\n",
      "0    0.988199\n",
      "1    0.011801\n",
      "Name: proportion, dtype: float64\n",
      "\n",
      "Test target distribution:\n",
      "has_resolution_incident\n",
      "0    0.988112\n",
      "1    0.011888\n",
      "Name: proportion, dtype: float64\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_48568/2877144001.py:8: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  X_train_rfe[selected_features_rfe] = X_train_rfe[selected_features_rfe].astype(float)\n"
     ]
    }
   ],
   "source": [
    "# Separate features and target\n",
    "X_train_rfe = X_train[selected_features_rfe]  # Use the features selected by RFE\n",
    "y_train_rfe = y_train\n",
    "X_test_rfe = X_test[selected_features_rfe]\n",
    "y_test_rfe = y_test\n",
    "\n",
    "# Scale numeric features\n",
    "X_train_rfe[selected_features_rfe] = X_train_rfe[selected_features_rfe].astype(float)\n",
    "\n",
    "print(f\"Training set size: {X_train_rfe.shape[0]} rows\")\n",
    "print(f\"Test set size: {X_test_rfe.shape[0]} rows\")\n",
    "\n",
    "print(\"\\nTraining target distribution:\")\n",
    "print(y_train_rfe.value_counts(normalize=True))\n",
    "\n",
    "print(\"\\nTest target distribution:\")\n",
    "print(y_test_rfe.value_counts(normalize=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bbc1524",
   "metadata": {},
   "source": [
    "### Using Lasso Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "f4b9c01a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training set size: 14914 rows\n",
      "Test set size: 6393 rows\n",
      "\n",
      "Training target distribution:\n",
      "has_resolution_incident\n",
      "0    0.988199\n",
      "1    0.011801\n",
      "Name: proportion, dtype: float64\n",
      "\n",
      "Test target distribution:\n",
      "has_resolution_incident\n",
      "0    0.988112\n",
      "1    0.011888\n",
      "Name: proportion, dtype: float64\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_48568/1333565449.py:8: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  X_train_lasso[selected_features_lasso] = X_train_lasso[selected_features_lasso].astype(float)\n"
     ]
    }
   ],
   "source": [
    "# Separate features and target\n",
    "X_train_lasso = X_train[selected_features_lasso]  # Use the features selected by lasso\n",
    "y_train_lasso = y_train\n",
    "X_test_lasso = X_test[selected_features_lasso]\n",
    "y_test_lasso = y_test\n",
    "\n",
    "# Scale numeric features\n",
    "X_train_lasso[selected_features_lasso] = X_train_lasso[selected_features_lasso].astype(float)\n",
    "\n",
    "print(f\"Training set size: {X_train_lasso.shape[0]} rows\")\n",
    "print(f\"Test set size: {X_test_lasso.shape[0]} rows\")\n",
    "\n",
    "print(\"\\nTraining target distribution:\")\n",
    "print(y_train_lasso.value_counts(normalize=True))\n",
    "\n",
    "print(\"\\nTest target distribution:\")\n",
    "print(y_test_lasso.value_counts(normalize=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d36c9276",
   "metadata": {},
   "source": [
    "## Classification Model with Random Forest\n",
    "\n",
    "We define a machine learning pipeline that includes:\n",
    "- **Scaling numeric features** with `StandardScaler`\n",
    "- **Training a Random Forest classifier** with balanced class weights to handle the imbalanced dataset\n",
    "\n",
    "We then use `GridSearchCV` to perform a **grid search with cross-validation** over a range of key hyperparameters (e.g., number of trees, max depth, etc.).  \n",
    "The model is evaluated using **Average Precision**, which is better suited for imbalanced classification tasks.\n",
    "\n",
    "The best combination of parameters is selected, and the resulting model is used to make predictions on the test set.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe3351be",
   "metadata": {},
   "source": [
    "### Model 1 with Kbest Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "943ef7d6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 5 folds for each of 72 candidates, totalling 360 fits\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   7.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   0.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   7.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.7s\n",
      "Best hyperparameters: {'model__max_depth': None, 'model__max_features': 'log2', 'model__min_samples_leaf': 2, 'model__min_samples_split': 2, 'model__n_estimators': 300}\n"
     ]
    }
   ],
   "source": [
    "# Define pipeline (scaling numeric features only)\n",
    "pipeline = Pipeline([\n",
    "    ('scaler', StandardScaler()),\n",
    "    ('model', RandomForestClassifier(class_weight=weights, # We have an imbalanced dataset\n",
    "                                     random_state=123))\n",
    "])\n",
    "\n",
    "# Define parameter grid\n",
    "param_grid = {\n",
    "    'model__n_estimators': [100, 200, 300],\n",
    "    'model__max_depth': [None, 10, 20],\n",
    "    'model__min_samples_split': [2, 5],\n",
    "    'model__min_samples_leaf': [1, 2],\n",
    "    'model__max_features': ['sqrt', 'log2']\n",
    "}\n",
    "\n",
    "# GridSearchCV\n",
    "grid_search = GridSearchCV(\n",
    "    estimator=pipeline,\n",
    "    param_grid=param_grid,\n",
    "    scoring='average_precision',  # For imbalanced classification\n",
    "    cv=5, # 5-fold cross-validation\n",
    "    n_jobs=-1, # Use all available cores\n",
    "    verbose=2 # Verbose output for progress tracking\n",
    ")\n",
    "\n",
    "# Fit the grid search on training data\n",
    "grid_search.fit(X_train_kbest, y_train_kbest)\n",
    "\n",
    "# Best model\n",
    "best_pipeline_kbest = grid_search.best_estimator_\n",
    "print(\"Best hyperparameters:\", grid_search.best_params_)\n",
    "\n",
    "# Predict on test set\n",
    "y_pred_proba_kbest = best_pipeline_kbest.predict_proba(X_test_kbest)[:, 1]\n",
    "y_pred_kbest = best_pipeline_kbest.predict(X_test_kbest)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "672444f7",
   "metadata": {},
   "source": [
    "### Model 2 with RFE Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "49cb625c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 5 folds for each of 72 candidates, totalling 360 fits\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   0.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   0.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   0.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   0.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   2.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   1.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   1.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   2.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.9s\n",
      "\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   0.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   1.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   1.8s[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   3.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   2.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   2.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   2.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   2.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   2.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   2.5s\n",
      "Best hyperparameters: {'model__max_depth': 10, 'model__max_features': 'sqrt', 'model__min_samples_leaf': 2, 'model__min_samples_split': 5, 'model__n_estimators': 100}\n"
     ]
    }
   ],
   "source": [
    "# Define pipeline (scaling numeric features only)\n",
    "pipeline = Pipeline([\n",
    "    ('scaler', StandardScaler()),\n",
    "    ('model', RandomForestClassifier(class_weight=weights, # We have an imbalanced dataset\n",
    "                                     random_state=123))\n",
    "])\n",
    "\n",
    "# Define parameter grid\n",
    "param_grid = {\n",
    "    'model__n_estimators': [100, 200, 300],\n",
    "    'model__max_depth': [None, 10, 20],\n",
    "    'model__min_samples_split': [2, 5],\n",
    "    'model__min_samples_leaf': [1, 2],\n",
    "    'model__max_features': ['sqrt', 'log2']\n",
    "}\n",
    "\n",
    "# GridSearchCV\n",
    "grid_search = GridSearchCV(\n",
    "    estimator=pipeline,\n",
    "    param_grid=param_grid,\n",
    "    scoring='average_precision',  # For imbalanced classification\n",
    "    cv=5, # 5-fold cross-validation\n",
    "    n_jobs=-1, # Use all available cores\n",
    "    verbose=2 # Verbose output for progress tracking\n",
    ")\n",
    "\n",
    "# Fit the grid search on training data\n",
    "grid_search.fit(X_train_rfe, y_train_rfe)\n",
    "\n",
    "# Best model\n",
    "best_pipeline_rfe = grid_search.best_estimator_\n",
    "print(\"Best hyperparameters:\", grid_search.best_params_)\n",
    "\n",
    "# Predict on test set\n",
    "y_pred_proba_rfe = best_pipeline_rfe.predict_proba(X_test_rfe)[:, 1]\n",
    "y_pred_rfe = best_pipeline_rfe.predict(X_test_rfe)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b763f4cd",
   "metadata": {},
   "source": [
    "### Model 3 with Lasso Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "47c6ab43",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 5 folds for each of 72 candidates, totalling 360 fits\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   7.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   7.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   7.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   3.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   7.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   7.2s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   7.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   7.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.7s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   5.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   7.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.8s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   8.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.5s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=None, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.5s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.3s\n",
      "[CV] END model__max_depth=10, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   0.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.2s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.7s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.5s\n",
      "[CV] END model__max_depth=10, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   4.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   5.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   7.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.4s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.5s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   7.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   7.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   5.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   4.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=100; total time=   1.6s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.7s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.1s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.2s\n",
      "[CV] END model__max_depth=20, model__max_features=sqrt, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=200; total time=   3.7s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   4.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=2, model__n_estimators=300; total time=   5.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=200; total time=   4.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.1s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   5.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=1, model__min_samples_split=5, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.1s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=200; total time=   4.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   1.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=100; total time=   2.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   5.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=2, model__n_estimators=300; total time=   6.0s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.8s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.6s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=200; total time=   3.4s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.5s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.9s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.2s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   4.3s\n",
      "[CV] END model__max_depth=20, model__max_features=log2, model__min_samples_leaf=2, model__min_samples_split=5, model__n_estimators=300; total time=   3.9s\n",
      "Best hyperparameters: {'model__max_depth': None, 'model__max_features': 'log2', 'model__min_samples_leaf': 2, 'model__min_samples_split': 2, 'model__n_estimators': 200}\n"
     ]
    }
   ],
   "source": [
    "# Define pipeline (scaling numeric features only)\n",
    "pipeline = Pipeline([\n",
    "    ('scaler', StandardScaler()),\n",
    "    ('model', RandomForestClassifier(class_weight=weights, # We have an imbalanced dataset\n",
    "                                     random_state=123))\n",
    "])\n",
    "\n",
    "# Define parameter grid\n",
    "param_grid = {\n",
    "    'model__n_estimators': [100, 200, 300],\n",
    "    'model__max_depth': [None, 10, 20],\n",
    "    'model__min_samples_split': [2, 5],\n",
    "    'model__min_samples_leaf': [1, 2],\n",
    "    'model__max_features': ['sqrt', 'log2']\n",
    "}\n",
    "\n",
    "# GridSearchCV\n",
    "grid_search = GridSearchCV(\n",
    "    estimator=pipeline,\n",
    "    param_grid=param_grid,\n",
    "    scoring='average_precision',  # For imbalanced classification\n",
    "    cv=5, # 5-fold cross-validation\n",
    "    n_jobs=-1, # Use all available cores\n",
    "    verbose=2 # Verbose output for progress tracking\n",
    ")\n",
    "\n",
    "# Fit the grid search on training data\n",
    "grid_search.fit(X_train_lasso, y_train_lasso)\n",
    "\n",
    "# Best model\n",
    "best_pipeline_lasso = grid_search.best_estimator_\n",
    "print(\"Best hyperparameters:\", grid_search.best_params_)\n",
    "\n",
    "# Predict on test set\n",
    "y_pred_proba_lasso = best_pipeline_lasso.predict_proba(X_test_lasso)[:, 1]\n",
    "y_pred_lasso = best_pipeline_lasso.predict(X_test_lasso)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc2fcc89",
   "metadata": {},
   "source": [
    "## Evaluation\n",
    "This section aims to evaluate how good the new model is vs. the actual Resolution Incidents.\n",
    "\n",
    "We start by computing and displaying the classification report, ROC Curve, PR Curve and the respective Area Under the Curve (AUC)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76099daf",
   "metadata": {},
   "source": [
    "### Model 1 evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "78887f46",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Actual and predicted\n",
    "y_true_kbest = y_test_kbest\n",
    "\n",
    "# Compute confusion matrix: [ [TN, FP], [FN, TP] ]\n",
    "tn, fp, fn, tp = confusion_matrix(y_true_kbest, y_pred_kbest).ravel()\n",
    "\n",
    "# Total predictions\n",
    "total = tp + tn + fp + fn\n",
    "\n",
    "# Compute all requested metrics\n",
    "recall_kbest = recall_score(y_true_kbest, y_pred_kbest)\n",
    "precision_kbest = precision_score(y_true_kbest, y_pred_kbest)\n",
    "f1_kbest = fbeta_score(y_true_kbest, y_pred_kbest, beta=1)\n",
    "f2_kbest = fbeta_score(y_true_kbest, y_pred_kbest, beta=2)\n",
    "fpr_kbest = fp / (fp + tn) if (fp + tn) != 0 else 0\n",
    "\n",
    "# Scores relative to total\n",
    "tp_score_kbest = tp / total\n",
    "tn_score_kbest = tn / total\n",
    "fp_score_kbest = fp / total\n",
    "fn_score_kbest = fn / total\n",
    "\n",
    "# Create DataFrame\n",
    "summary_df_kbest = pd.DataFrame([{\n",
    "    \"title\": \"Kbest\",\n",
    "    \"flagging_analysis_type\": \"RISK_VS_CLAIM using KBest Features from all features\",\n",
    "    \"count_total\": total,\n",
    "    \"count_true_positive\": tp,\n",
    "    \"count_true_negative\": tn,\n",
    "    \"count_false_positive\": fp,\n",
    "    \"count_false_negative\": fn,\n",
    "    \"true_positive_score\": tp_score_kbest,\n",
    "    \"true_negative_score\": tn_score_kbest,\n",
    "    \"false_positive_score\": fp_score_kbest,\n",
    "    \"false_negative_score\": fn_score_kbest,\n",
    "    \"recall_score\": recall_kbest,\n",
    "    \"precision_score\": precision_kbest,\n",
    "    \"false_positive_rate_score\": fpr_kbest,\n",
    "    \"f1_score\": f1_kbest,\n",
    "    \"f2_score\": f2_kbest\n",
    "}])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea079e83",
   "metadata": {},
   "source": [
    "### Model 2 evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "03c83137",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Actual and predicted\n",
    "y_true_rfe = y_test_rfe\n",
    "\n",
    "# Compute confusion matrix: [ [TN, FP], [FN, TP] ]\n",
    "tn, fp, fn, tp = confusion_matrix(y_true_rfe, y_pred_rfe).ravel()\n",
    "\n",
    "# Total predictions\n",
    "total = tp + tn + fp + fn\n",
    "\n",
    "# Compute all requested metrics\n",
    "recall_rfe = recall_score(y_true_rfe, y_pred_rfe)\n",
    "precision_rfe = precision_score(y_true_rfe, y_pred_rfe)\n",
    "f1_rfe = fbeta_score(y_true_rfe, y_pred_rfe, beta=1)\n",
    "f2_rfe = fbeta_score(y_true_rfe, y_pred_rfe, beta=2)\n",
    "fpr_rfe = fp / (fp + tn) if (fp + tn) != 0 else 0\n",
    "\n",
    "# Scores relative to total\n",
    "tp_score_rfe = tp / total\n",
    "tn_score_rfe = tn / total\n",
    "fp_score_rfe = fp / total\n",
    "fn_score_rfe = fn / total\n",
    "\n",
    "# Create DataFrame\n",
    "summary_df_rfe = pd.DataFrame([{\n",
    "    \"title\": \"RFE\",\n",
    "    \"flagging_analysis_type\": \"RISK_VS_CLAIM using RFE Features from all features\",\n",
    "    \"count_total\": total,\n",
    "    \"count_true_positive\": tp,\n",
    "    \"count_true_negative\": tn,\n",
    "    \"count_false_positive\": fp,\n",
    "    \"count_false_negative\": fn,\n",
    "    \"true_positive_score\": tp_score_rfe,\n",
    "    \"true_negative_score\": tn_score_rfe,\n",
    "    \"false_positive_score\": fp_score_rfe,\n",
    "    \"false_negative_score\": fn_score_rfe,\n",
    "    \"recall_score\": recall_rfe,\n",
    "    \"precision_score\": precision_rfe,\n",
    "    \"false_positive_rate_score\": fpr_rfe,\n",
    "    \"f1_score\": f1_rfe,\n",
    "    \"f2_score\": f2_rfe\n",
    "}])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c2f75c9",
   "metadata": {},
   "source": [
    "### Model 3 evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "7d34f389",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Actual and predicted\n",
    "y_true_lasso = y_test_lasso\n",
    "\n",
    "# Compute confusion matrix: [ [TN, FP], [FN, TP] ]\n",
    "tn, fp, fn, tp = confusion_matrix(y_true_lasso, y_pred_lasso).ravel()\n",
    "\n",
    "# Total predictions\n",
    "total = tp + tn + fp + fn\n",
    "\n",
    "# Compute all requested metrics\n",
    "recall_lasso = recall_score(y_true_lasso, y_pred_lasso)\n",
    "precision_lasso = precision_score(y_true_lasso, y_pred_lasso)\n",
    "f1_lasso = fbeta_score(y_true_lasso, y_pred_lasso, beta=1)\n",
    "f2_lasso = fbeta_score(y_true_lasso, y_pred_lasso, beta=2)\n",
    "fpr_lasso = fp / (fp + tn) if (fp + tn) != 0 else 0\n",
    "\n",
    "# Scores relative to total\n",
    "tp_score_lasso = tp / total\n",
    "tn_score_lasso = tn / total\n",
    "fp_score_lasso = fp / total\n",
    "fn_score_lasso = fn / total\n",
    "\n",
    "# Create DataFrame\n",
    "summary_df_lasso = pd.DataFrame([{\n",
    "    \"title\": \"Lasso\",\n",
    "    \"flagging_analysis_type\": \"RISK_VS_CLAIM using Lasso Features from all features\",\n",
    "    \"count_total\": total,\n",
    "    \"count_true_positive\": tp,\n",
    "    \"count_true_negative\": tn,\n",
    "    \"count_false_positive\": fp,\n",
    "    \"count_false_negative\": fn,\n",
    "    \"true_positive_score\": tp_score_lasso,\n",
    "    \"true_negative_score\": tn_score_lasso,\n",
    "    \"false_positive_score\": fp_score_lasso,\n",
    "    \"false_negative_score\": fn_score_lasso,\n",
    "    \"recall_score\": recall_lasso,\n",
    "    \"precision_score\": precision_lasso,\n",
    "    \"false_positive_rate_score\": fpr_lasso,\n",
    "    \"f1_score\": f1_lasso,\n",
    "    \"f2_score\": f2_lasso\n",
    "}])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "09609773",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_confusion_matrix_from_df(df, flagging_analysis_type):\n",
    "\n",
    "    # Subset - just retrieve one row depending on the flagging_analysis_type\n",
    "    row = df[df['flagging_analysis_type'] == flagging_analysis_type].iloc[0]\n",
    "\n",
    "    # Define custom x-axis labels and wording\n",
    "    if flagging_analysis_type == 'RISK_VS_CLAIM':\n",
    "        x_labels = ['With Submitted Claim', 'Without Submitted Claim']\n",
    "        outcome_label = \"submitted claim\"\n",
    "    elif flagging_analysis_type == 'RISK_VS_SUBMITTED_PAYOUT':\n",
    "        x_labels = ['With Submitted Payout', 'Without Submitted Payout']\n",
    "        outcome_label = \"submitted payout\"\n",
    "    else:\n",
    "        x_labels = ['Actual Positive', 'Actual Negative']  \n",
    "        outcome_label = \"outcome\"\n",
    "\n",
    "    # Confusion matrix structure\n",
    "    cm = np.array([\n",
    "        [row['count_true_positive'], row['count_false_positive']],\n",
    "        [row['count_false_negative'], row['count_true_negative']]\n",
    "    ])\n",
    "\n",
    "    # Create annotations for the confusion matrix\n",
    "    labels = [['True Positives', 'False Positives'], ['False Negatives', 'True Negatives']]\n",
    "    counts = [[f\"{v:,}\" for v in [row['count_true_positive'], row['count_false_positive']]],\n",
    "              [f\"{v:,}\" for v in [row['count_false_negative'], row['count_true_negative']]]]\n",
    "    percentages = [[f\"{round(100*v,2):,}\" for v in [row['true_positive_score'], row['false_positive_score']]],\n",
    "                   [f\"{round(100*v,2):,}\" for v in [row['false_negative_score'], row['true_negative_score']]]]\n",
    "    annot = [[f\"{labels[i][j]}\\n{counts[i][j]} ({percentages[i][j]}%)\" for j in range(2)] for i in range(2)]\n",
    "\n",
    "    # Scores formatted as percentages\n",
    "    recall = row['recall_score'] * 100\n",
    "    precision = row['precision_score'] * 100\n",
    "    f1 = row['f1_score'] * 100\n",
    "    f2 = row['f2_score'] * 100\n",
    "\n",
    "    # Set up figure and axes manually for precise control\n",
    "    fig = plt.figure(figsize=(9, 8))\n",
    "    grid = fig.add_gridspec(nrows=4, height_ratios=[2, 2, 15, 2])\n",
    "\n",
    "    \n",
    "    ax_main_title = fig.add_subplot(grid[0])\n",
    "    ax_main_title.axis('off')\n",
    "    ax_main_title.set_title(f\"Random Predictor - Flagged as Risk vs. {outcome_label.title()}\", fontsize=14, weight='bold')\n",
    "    \n",
    "    # Business explanation text\n",
    "    ax_text = fig.add_subplot(grid[1])\n",
    "    ax_text.axis('off')\n",
    "    business_text = (\n",
    "        f\"Flagging performance analysis:\\n\\n\"\n",
    "        f\"- Of all the bookings we flagged as at Risk, {precision:.2f}% actually turned into a {outcome_label}.\\n\"\n",
    "        f\"- Of all the bookings that resulted in a {outcome_label}, we correctly flagged {recall:.2f}% of them.\\n\"\n",
    "        f\"- The pure balance between these two is summarized by a score of {f1:.2f}%.\\n\"\n",
    "        f\"- If we prioritise better probability of detection of a {outcome_label}, the balanced score is {f2:.2f}%.\\n\"\n",
    "    )\n",
    "    ax_text.text(0.0, 0.0, business_text, fontsize=10.5, ha='left', va='bottom', wrap=False, linespacing=1.5)\n",
    "\n",
    "    # Heatmap\n",
    "    ax_heatmap = fig.add_subplot(grid[2])\n",
    "    ax_heatmap.set_title(f\"Confusion Matrix – Risk vs. {outcome_label.title()}\", fontsize=12, weight='bold', ha='center', va='center', wrap=False)\n",
    "\n",
    "    cmap = sns.light_palette(\"#315584\", as_cmap=True)\n",
    "\n",
    "    sns.heatmap(cm, annot=annot, fmt='', cmap=cmap, cbar=False,\n",
    "                xticklabels=x_labels,\n",
    "                yticklabels=['Flagged as Risk', 'Flagged as No Risk'],\n",
    "                ax=ax_heatmap,\n",
    "                linewidths=1.0,\n",
    "                annot_kws={'fontsize': 10, 'linespacing': 1.2})\n",
    "    ax_heatmap.set_xlabel(\"Resolution Outcome (Actual)\", fontsize=11, labelpad=10)\n",
    "    ax_heatmap.set_ylabel(\"Flagging (Prediction)\", fontsize=11, labelpad=10)\n",
    "    \n",
    "    # Make borders visible\n",
    "    for _, spine in ax_heatmap.spines.items():\n",
    "        spine.set_visible(True)\n",
    "\n",
    "    # Footer with metrics and date\n",
    "    ax_footer = fig.add_subplot(grid[3])\n",
    "    ax_footer.axis('off')\n",
    "    metrics_text = f\"Total Booking Count: {row['count_total']}   |   Recall: {recall:.2f}%   |   Precision: {precision:.2f}%   |   F1 Score: {f1:.2f}%   |   F2 Score: {f2:.2f}%\"\n",
    "    date_text = f\"Generated on {date.today().strftime('%B %d, %Y')}\"\n",
    "    ax_footer.text(0.5, 0.7, metrics_text, ha='center', fontsize=9)\n",
    "    ax_footer.text(0.5, 0.1, date_text, ha='center', fontsize=8, color='gray')\n",
    "\n",
    "    plt.tight_layout()\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "7cc4a1d2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3kAAAMVCAYAAADKxj30AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdcFMf/P/DXUQ6O3juhWAArTUVBKYoFUIk1xgKoscbEWGJXbNEEe41RY6+o2IJdsaLBFluMGsGKIooiitT37w9/t1+Wu4PDEj8h7+fjwUN3d3Zmdrbd7MzOSoiIwBhjjDHGGGOsUtD41BlgjDHGGGOMMfbhcCWPMcYYY4wxxioRruQxxhhjjDHGWCXClTzGGGOMMcYYq0S4kscYY4wxxhhjlQhX8hhjjDHGGGOsEuFKHmOMMcYYY4xVIlzJY4wxxhhjjLFKhCt5jDHGGGOMMVaJcCWPsf+gtLQ0SCQS4S8pKelTZ6lSi46OFso6KChItKzkfli5cuUnyV9lExsbK5Sps7Pzp87OJ7dy5UrRcVbZlXW+VQQfR4yxfzOu5DFWQUlJSaIfTPI/TU1NmJiYwNvbGyNGjMCjR48+dVYrLWdnZ6X7QCqVws7ODm3atMHOnTs/dTb/UZX1h3zJH+xl/fGDisqlZAWr9DluZWWFwMBAzJ07F2/evPnUWa1Ujh8/jp49e8LNzQ2GhobQ0dGBnZ0dwsLCsGTJkg9e3h+qQs4YU6T1qTPAWGVRXFyMFy9e4MKFC7hw4QJWr16N33//HY6Ojp86a/8ZBQUFSE9Px65du7Br1y589dVX+OWXXz51tsoUFxcn/L9evXqfMCeM/e8rKCjAkydP8OTJExw7dgzbtm3D4cOHoampKYT54osvUKtWLQDg66+acnJy0KtXL2zevFlhWXp6OtLT07Fnzx5Mnz4dW7ZsgY+PzyfIJWOsIriSx9h76ty5M3x9fZGdnY3t27fj8uXLAIBHjx5h9uzZmDVr1ifOYeXm6uqK/v37AwDu3r2LVatWITs7GwCwdOlShIeHo23btuXG8+rVK8hkMmho/LMdHIYNG/aPpveh/dPlVrJSXFKVKlX+kfTZpzF69GiYmJjg0aNHWLt2LTIyMgAAx44dw2+//YY2bdoIYVu2bImWLVt+qqz+6xQXF6Nz585ITEwU5lWrVg2ff/45DA0NkZycLCxLS0tDaGgozpw5g2rVqn2qLDPG1EGMsQo5cuQIARD+VqxYISx7/vw5SaVSYVmLFi1E6z59+pSGDx9OISEh5OTkRAYGBqStrU1WVlbUrFkzWr16NRUXF5eZ3t9//00LFy6k2rVrk46ODllaWlKvXr3o2bNnCnl99eoVjRgxghwcHEhHR4dq1KhBCxYsoNu3b4viPHLkiMK6W7ZsobCwMLK2tiZtbW0yMTGhhg0b0owZM+jVq1cK4UuXyerVq6lu3bqkq6tLVapUoVmzZhERUUFBAU2ePJmcnZ1JKpWSu7s7/fLLLxXaB05OTkJagYGBomUHDhwQ5aV79+5K15swYQIdP36cmjZtSkZGRgSAsrKyhLAXL16kmJgYcnV1JV1dXdLX1ydPT0+aOnUq5eTkKM3X0aNHKTAwkPT09MjU1JQ6dOhAt27doqioKJX5VXUsyf3+++8UHR1NVapUIZlMRvr6+lStWjWKjo6mW7duUWpqqigOZX8TJkwQxXnw4EFq37492dvbk1QqJUNDQ/Ly8qLx48fT06dPyyzv8srtQytZduresiZMmCCEd3JyEi3btm0bdevWjWrXrk1WVlakra1N+vr65OHhQQMHDqTU1FSlcV66dIkiIiLI0NCQDA0NqWXLlnThwoUy0yIiOnbsmOiY6NixI92+fbvMY4KI6NGjRzRq1CiqW7cuGRgYkI6ODlWpUoUGDBhAd+7cUZrHtLQ0+uKLL8jU1JT09PSocePGdODAAVqxYkWFy5CIaPny5dSxY0dyd3cnc3Nz0tLSIkNDQ6pbty59//339OTJE6V56NOnD1WtWpV0dXVJR0eH7OzsqFGjRvTdd9/RtWvX1Eq7ZLkCEO2XPXv2iJZNmzZNtG5ZZXvp0iXq2rUrOTk5kVQqJV1dXXJ0dKTg4GAaOXIk3b9/X2keSu7bgoICat++vbBMV1eX9uzZo3JbXrx4QXp6emWe5506dRKWN2vWTJh/7NgxioyMJDs7O+FYdXJyopYtW9KECRPo+fPnapVnWdatWycqz1atWlFeXp4ozMqVK0VhWrZsKVpe1nVM2f4ofUwq+yt5XyouLqb4+Hhq3bo12dnZkVQqJVNTU/L09KTvvvtOIb/379+nYcOGUa1atUhfX590dHTIycmJunbtSmfOnFEog9L7+uHDh9SjRw8yNzcnQ0NDioiIoL/++ouIiM6dO0ctWrQgAwMDMjExoQ4dOtDdu3eVlu273EcY+1C4ksdYBZVVySMiMjMzE5Z17dpVtOzy5cvl3thiYmLKTC8gIEDpek2aNBGtl5+fT40bN1YaNjw8XOXNtLCwUPSDQ9mfh4cHPXz4UJReyeU+Pj5K1xs3bhy1bdtW6bLly5ervQ/KquTl5OSI4g0NDVW6XsOGDUlTU1MUVl5ZWbRoEWlpaanc/ho1alB6eroo3V27dildx8zMjBo2bKgyv2UdSxMnTiSJRKIyHwkJCRWu5A0ZMqTMsPb29nTlyhWV5V1WuX0MH7qSV/LHubI/IyMjunTpkmidlJQUMjAwUAirq6tLoaGhKtNSdUyYm5tTo0aNVB4Tp06dIgsLC5V5NDY2pmPHjonWSU1NJRsbG4WwEomEwsLCKlyGRKTyPC55rDx48EAI//jxY7K0tCxzncWLF6uVdlmVvEuXLomWLV26VLSuqkre1atXRZUtZX8lK2vKjqPCwkL64osvhPn6+vp06NChcrene/fuwjrNmzcXLXv58iXJZDJh+fr164no7cOY0uda6b8///xTrfIsS2BgoBCfhoaGUJkpreR1DAClpaUJy8q6jr1vJS83N1fhnlX6r+Q16OjRo2RqaqoyrIaGBs2cOVOUx5L72szMjJydnRXWs7S0pISEBNLR0VFYVq1aNcrNzRXF+S73EcY+JO6uydgHkp2djZUrV+LZs2fCvE6dOonCaGhowMPDA/Xr14eNjQ1MTEzw5s0bXLhwAbt27QIRYcWKFejXrx/q16+vNJ0TJ06gadOmaNSokah76LFjx3D69Gn4+fkBAObOnYvjx48L63l5eSEiIgJXrlxBQkKCyu344YcfRO9l+Pn5oXnz5vjzzz8RHx8PAPjzzz/RtWtXHD58WGkc586dQ8OGDREaGopNmzbhr7/+AgBMnjwZABAYGIgmTZpg6dKlwgA1P/30E3r27KkyX+pKTk4WTdvY2KgMp6enh27dusHe3h4XLlyApqYmTp06ha+//hrFxcXC9rds2RIvX77EqlWrkJmZiWvXrqFHjx7Yv38/AOD169fo1asXCgsLAQDa2tro2bMnTE1NsXbtWoU8qSM+Ph4TJkwQpvX09PDFF1/AyckJqamp2LVrFwDAzMwMcXFxOHv2LDZt2iSEL9mtsVGjRgCANWvWiLoP16xZE59//jkePnyIVatWoaioCA8ePEC7du1w9epVaGkp3iJUlds/ZcaMGQrzjI2N8dVXX6m1vomJCZo3bw4PDw+YmppCKpXi8ePHSEhIwN27d5GdnY0RI0YI3dOICD179kROTo4QR5cuXeDq6orNmzfjwIEDStMpfUxoaWkhJiYGZmZmWL16NU6dOqV0vezsbERGRiIzMxMA4OTkhM6dO0Mmk2HLli24evUqXrx4gfbt2+PmzZswNjYGAHz99deiwZ5at24NLy8v7NmzR9QNryKsrKzQunVrVKlSBWZmZtDU1MSDBw+wadMmPH36FA8ePMCUKVOwaNEiAMDWrVvx5MkTAICpqSliYmJgbm6Ohw8f4vr166Lr0bsgIjx69Eh0bMtkMkRERKi1/qpVq/D69WsAgIODA7p16wZ9fX3cv38fV65cwenTp8tcv7i4GD179sTGjRsBAEZGRkhMTIS/v3+5acfExGDNmjUAgEOHDiEjIwNWVlYAgO3
      "text/plain": [
       "<Figure size 900x800 with 4 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3kAAAMVCAYAAADKxj30AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XVcFdn/P/DXBS5w6W4kDEBRUTBBBcTEwEDXJFxb1+7CNVestV27AxVFxVawULHWWmvFTgwMUOr9+8PfnS/DvRcuxvpZ9v18PHjonDlzzpkzdc/MmTMSIiIwxhhjjDHGGCsWNH50ARhjjDHGGGOMfTvcyGOMMcYYY4yxYoQbeYwxxhhjjDFWjHAjjzHGGGOMMcaKEW7kMcYYY4wxxlgxwo08xhhjjDHGGCtGuJHHGGOMMcYYY8UIN/IYY4wxxhhjrBjhRh5jjDHGGGOMFSPcyGPsP+ju3buQSCTCX0JCwo8uUrEWHh4u1LW/v79oXt7tsHLlyh9SvuImKipKqFNnZ+cfXZwfbuXKlaL9rLgr6HgrCt6PGGP/ZtzIY6yIEhISRD+Y5H+ampowMTFB5cqVMWzYMDx9+vRHF7XYcnZ2VroNtLW1YWdnh2bNmiEuLu5HF/MfVVx/yOf9wV7QH9+oKF7yNrDyH+NWVlaoU6cOfv/9d3z8+PFHF7VYOXbsGCIjI+Hm5gZDQ0Po6OjAzs4OjRs3xuLFi795fX+rBjljTJHWjy4AY8VFbm4u0tLScOHCBVy4cAGrV6/GmTNn4Ojo+KOL9p+RlZWFJ0+eYOfOndi5cye6du2KP/7440cXq0DR0dHC/6tUqfIDS8LY/76srCy8ePECL168wNGjR7Ft2zYcPnwYmpqaQpyffvoJnp6eAMDnXzW9f/8eXbp0webNmxXmPXnyBE+ePMGePXswdepUbNmyBd7e3j+glIyxouBGHmNfqW3btvDx8cHbt2+xfft2XL58GQDw9OlTzJo1CzNnzvzBJSzeXF1d0bNnTwDA/fv3sWrVKrx9+xYAsGTJEgQHB6N58+aFpvPhwwfIZDJoaPyzHRwGDx78j+b3rf3T9Za3UZxXyZIl/5H82Y8xcuRImJiY4OnTp1i7di2eP38OADh69Ch2796NZs2aCXEbNmyIhg0b/qii/uvk5uaibdu2iI+PF8JKly6NFi1awNDQEElJScK8u3fvol69ejh9+jRKly79o4rMGFMHMcaK5MiRIwRA+FuxYoUw782bN6StrS3Ma9CggWjZly9f0pAhQygwMJCcnJzIwMCApFIpWVlZUVBQEK1evZpyc3MLzO/vv/+m+fPnU/ny5UlHR4csLS2pS5cu9OrVK4WyfvjwgYYNG0YODg6ko6NDZcuWpXnz5tGdO3dEaR45ckRh2S1btlDjxo3J2tqapFIpmZiYUI0aNWj69On04cMHhfj562T16tVUsWJF0tXVpZIlS9LMmTOJiCgrK4smTJhAzs7OpK2tTe7u7vTHH38UaRs4OTkJedWpU0c078CBA6KydOrUSely48aNo2PHjlHdunXJyMiIANDr16+FuBcvXqSIiAhydXUlXV1d0tfXJy8vL5o0aRK9f/9eabkSExOpTp06pKenR6amptS6dWu6ffs2hYWFqSyvqn1J7syZMxQeHk4lS5YkmUxG+vr6VLp0aQoPD6fbt29TSkqKKA1lf+PGjROlefDgQWrVqhXZ29uTtrY2GRoaUqVKlWjs2LH08uXLAuu7sHr71vLWnbqXrHHjxgnxnZycRPO2bdtGHTt2pPLly5OVlRVJpVLS19cnDw8P6t27N6WkpChN89KlS9SkSRMyNDQkQ0NDatiwIV24cKHAvIiIjh49KtonQkND6c6dOwXuE0RET58+pREjRlDFihXJwMCAdHR0qGTJktSrVy+6d++e0jLevXuXfvrpJzI1NSU9PT2qVasWHThwgFasWFHkOiQiWrZsGYWGhpK7uzuZm5uTlpYWGRoaUsWKFWno0KH04sULpWXo1q0blSpVinR1dUlHR4fs7OyoZs2aNGDAALp27ZpaeeetVwCi7bJnzx7RvClTpoiWLahuL126RB06dCAnJyfS1tYmXV1dcnR0pICAABo+fDg9fPhQaRnybtusrCxq1aqVME9XV5f27Nmjcl3S0tJIT0+vwOO8TZs2wvygoCAh/OjRoxQSEkJ2dnbCvurk5EQNGzakcePG0Zs3b9Sqz4KsW7dOVJ+NGjWiT58+ieKsXLlSFKdhw4ai+QWdx5Rtj/z7pLK/vNel3NxciomJoaZNm5KdnR1pa2uTqakpeXl50YABAxTK+/DhQxo8eDB5enqSvr4+6ejokJOTE3Xo0IFOnz6tUAf5t/Xjx4+pc+fOZG5uToaGhtSkSRO6ceMGERGdO3eOGjRoQAYGBmRiYkKtW7em+/fvK63bL7mOMPatcCOPsSIqqJFHRGRmZibM69Chg2je5cuXC72wRUREFJifn5+f0uVq164tWi4zM5Nq1aqlNG5wcLDKi2l2drboB4eyPw8PD3r8+LEov7zzvb29lS43ZswYat68udJ5y5YtU3sbFNTIe//+vSjdevXqKV2uRo0apKmpKYorb6wsWLCAtLS0VK5/2bJl6cmTJ6J8d+7cqXQZMzMzqlGjhsryFrQvjR8/niQSicpyxMbGFrmRN3DgwALj2tvb05UrV1TWd0H19j1860Ze3h/nyv6MjIzo0qVLomWSk5PJwMBAIa6uri7Vq1dPZV6q9glzc3OqWbOmyn3i5MmTZGFhobKMxsbGdPToUdEyKSkpZGNjoxBXIpFQ48aNi1yHRKTyOM67rzx69EiI/+zZM7K0tCxwmYULF6qVd0GNvEuXLonmLVmyRLSsqkbe1atXRY0tZX95G2vK9qPs7Gz66aefhHB9fX06dOhQoevTqVMnYZn69euL5r17945kMpkwf/369UT0+WZM/mMt/99ff/2lVn0WpE6dOkJ6GhoaQmMmv7znMQB09+5dYV5B57GvbeRlZGQoXLPy/+U9ByUmJpKpqanKuBoaGjRjxgxRGfNuazMzM3J2dlZYztLSkmJjY0lHR0dhXunSpSkjI0OU5pdcRxj7lri7JmPfyNu3b7Fy5Uq8evVKCGvTpo0ojoaGBjw8PFC1alXY2NjAxMQEHz9+xIULF7Bz504QEVasWIEePXqgatWqSvM5fvw46tati5o1a4q6hx49ehSnTp1C9erVAQC///47jh07JixXqVIlNGnSBFeuXEFsbKzK9Zg8ebLovYzq1aujfv36+OuvvxATEwMA+Ouvv9ChQwccPnxYaRrnzp1DjRo1UK9ePWzatAk3btwAAEyYMAEAUKdOHdSuXRtLliwRBqiZNm0aIiMjVZZLXUlJSaJpGxsblfH09PTQsWNH2Nvb48KFC9DU1MTJkyfRp08f5ObmCuvfsGFDvHv3DqtWrUJqaiquXbuGzp07Y//+/QCA9PR0dOnSBdnZ2QAAqVSKyMhImJqaYu3atQplUkdMTAzGjRsnTOvp6eGnn36Ck5MTUlJSsHPnTgCAmZkZoqOjcfbsWWzatEmIn7dbY82aNQEAa9asEXUfLleuHFq0aIHHjx9j1apVyMnJwaNHj9CyZUtcvXoVWlqKlwhV9fZPmT59ukKYsbExunbtqtbyJiYmqF+/Pjw8PGBqagptbW08e/YMsbGxuH//Pt6+fYthw4YJ3dOICJGRkXj//r2QRrt27eDq6orNmzfjwIEDSvPJv09oaWkhIiICZmZmWL16NU6ePKl0ubdv3yIkJASpqakAACcnJ7Rt2xYymQxbtmzB1atXkZaWhlatWuHWrVswNjYGAPTp00c02FPTpk1RqVIl7NmzR9QNryisrKzQtGlTlCxZEmZmZtDU1MSjR4+wadMmvHz5Eo8ePcLEiROxYMECAMDWrVvx4sULAICpqSkiIiJgbm6Ox48f4/r166Lz0ZcgIjx9+lS0b8tkMjRp0kSt5VetWoX09HQAgIODAzp27Ah9fX08fPgQV65cwalTpwpcPjc3F5GRkdi4cSMAwMjICPHx8fD19S0074iICKxZswYAcOjQITx//hxWVlYAgO3
      "text/plain": [
       "<Figure size 900x800 with 4 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3kAAAMVCAYAAADKxj30AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdcFMf/P/DXUQ6O3juhWAAbIKgoqIgNxYIFjRXQJLYYC/aKLRqx1xiNvSEqSgxWFGxosLfYQeyIvSD1/fvD3+2XhTs4LPET8n4+Hjz0ZmdnZmdnd29uZ2clRERgjDHGGGOMMVYuqH3tAjDGGGOMMcYY+3y4k8cYY4wxxhhj5Qh38hhjjDHGGGOsHOFOHmOMMcYYY4yVI9zJY4wxxhhjjLFyhDt5jDHGGGOMMVaOcCePMcYYY4wxxsoR7uQxxhhjjDHGWDnCnTzGGGOMMcYYK0e4k8fYf1BaWhokEonwl5iY+LWLVK6FhYUJde3v7y9aVng/rF69+quUr7yJjIwU6tTR0fFrF+erW716taidlXclHW9lwe2IMfZvxp08xsooMTFR9IVJ/qeurg4jIyPUrFkTI0eOxKNHj752UcstR0dHhftAKpXCxsYGbdq0QVxc3Ncu5j+qvH6RL/yFvaQ//qGifCncwSp6jFtYWKBhw4aYP38+3r9//7WLWq4cOXIEvXr1gouLC/T19aGlpQUbGxu0bNkSy5Yt++z1/bk65Iyx4jS+dgEYKy8KCgrw8uVLnD17FmfPnsXatWvx119/wd7e/msX7T8jNzcXDx8+xB9//IE//vgD33//PX777bevXawSRUVFCf+vVavWVywJY//7cnNz8eTJEzx58gSHDx/G9u3bcfDgQairqwtxvv32W1SrVg0A+Pyrojdv3qB3797YsmVLsWUPHz7Ew4cPsXv3bsyYMQNbt26Fl5fXVyglY6wsuJPH2Cfq3LkzvL298erVK+zYsQMXL14EADx69Ahz587FnDlzvnIJyzdnZ2f069cPAJCeno41a9bg1atXAIDly5cjKCgIbdu2LTWdt2/fQiaTQU3tnx3gMGzYsH80v8/tn663wp3iwipUqPCP5M++jjFjxsDIyAiPHj3C+vXrkZGRAQA4fPgw/vzzT7Rp00aIGxgYiMDAwK9V1H+dgoICdO7cGfHx8UJYpUqV0K5dO+jr6yM5OVlYlpaWhqZNm+LkyZOoVKnS1yoyY0wVxBgrk0OHDhEA4W/VqlXCshcvXpBUKhWWNW/eXLTu06dPafjw4RQQEEAODg6kp6dHmpqaZGFhQU2aNKG1a9dSQUFBifndunWLFi9eTNWrVyctLS0yNzen3r1707Nnz4qV9e3btzRy5Eiys7MjLS0tqlKlCi1atIhu374tSvPQoUPF1t26dSu1bNmSLC0tSVNTk4yMjKhu3bo0a9Ysevv2bbH4Retk7dq15O7uTtra2lShQgWaM2cOERHl5ubSlClTyNHRkaRSKbm6utJvv/1Wpn3g4OAg5NWwYUPRsv3794vK0qNHD4XrTZw4kY4cOUKNGzcmAwMDAkDPnz8X4p47d47Cw8PJ2dmZtLW1SVdXlzw8PGjatGn05s0bheVKSkqihg0bko6ODhkbG1PHjh3p5s2bFBoaqrS8ytqS3F9//UVhYWFUoUIFkslkpKurS5UqVaKwsDC6efMmpaamitJQ9Ddx4kRRmgcOHKAOHTqQra0tSaVS0tfXJ09PT5owYQI9ffq0xPourd4+t8J1p+ola+LEiUJ8BwcH0bLt27dT9+7dqXr16mRhYUGampqkq6tLbm5uNGDAAEpNTVWY5oULF6hVq1akr69P+vr6FBgYSGfPni0xLyKiw4cPi9pESEgI3b59u8Q2QUT06NEjGj16NLm7u5Oenh5paWlRhQoVqH///nTnzh2FZUxLS6Nvv/2WjI2NSUdHh+rXr0/79++nVatWlbkOiYh+//13CgkJIVdXVzI1NSUNDQ3S19cnd3d3GjFiBD158kRhGX744QeqWLEiaWtrk5aWFtnY2FC9evVoyJAhdOXKFZXyLlyvAET7Zffu3aJl06dPF61bUt1euHCBunXrRg4ODiSVSklbW5vs7e2pUaNGNGrUKLp3757CMhTet7m5udShQwdhmba2Nu3evVvptrx8+ZJ0dHRKPM47deokLG/SpIkQfvjwYQoODiYbGxuhrTo4OFBgYCBNnDiRXrx4oVJ9lmTDhg2i+mzRogVlZ2eL4qxevVoUJzAwULS8pPOYov1RtE0q+it8XSooKKCYmBhq3bo12djYkFQqJWNjY/Lw8KAhQ4YUK++9e/do2LBhVK1aNdLV1SUtLS1ycHCgbt260cmTJ4vVQdF9/eDBA+rZsyeZmpqSvr4+tWrViq5du0ZERKdPn6bmzZuTnp4eGRkZUceOHSk9PV1h3X7MdYSxz4U7eYyVUUmdPCIiExMTYVm3bt1Eyy5evFjqhS08PLzE/Pz8/BSu16BBA9F6OTk5VL9+fYVxg4KClF5M8/LyRF84FP25ubnRgwcPRPkVXu7l5aVwvfHjx1Pbtm0VLvv9999V3gcldfLevHkjSrdp06YK16tbty6pq6uL4so7K0uWLCENDQ2l21+lShV6+PChKN8//vhD4TomJiZUt25dpeUtqS1NmjSJJBKJ0nLExsaWuZM3dOjQEuPa2trSpUuXlNZ3SfX2JXzuTl7hL+eK/gwMDOjChQuidVJSUkhPT69YXG1tbWratKnSvJS1CVNTU6pXr57SNnH8+HEyMzNTWkZDQ0M6fPiwaJ3U1FSysrIqFlcikVDLli3LXIdEpPQ4LtxW7t+/L8R//PgxmZubl7jO0qVLVcq7pE7ehQsXRMuWL18uWldZJ+/y5cuizpaiv8KdNUXtKC8vj7799lshXFdXlxISEkrdnh49egjrNGvWTLTs9evXJJPJhOUbN24kog8/xhQ91or+/f333yrVZ0kaNmwopKempiZ0ZooqfB4DQGlpacKyks5jn9rJy8rKKnbNKvpX+ByUlJRExsbGSuOqqanR7NmzRWUsvK9NTEzI0dGx2Hrm5uYUGxtLWlpaxZZVqlSJsrKyRGl+zHWEsc+Jh2sy9pm8evUKq1evxrNnz4SwTp06ieKoqanBzc0NtWvXhpWVFYyMjPD+/XucPXsWf/zxB4gIq1atQt++fVG7dm2F+Rw9ehSNGzdGvXr1RMNDDx8+jBMnTsDHxwcAMH/+fBw5ckRYz9PTE61atcKlS5cQGxurdDt+/vln0XMZPj4+aNasGf7++2/ExMQAAP7++29069YNBw8eVJjG6dOnUbduXTRt2hTR0dG4du0aAGDKlCkAgIYNG6JBgwZYvny5MEHNzJkz0atXL6XlUlVycrLos5WVldJ4Ojo66N69O2xtbXH27Fmoq6vj+PHj+PHHH1FQUCBsf2BgIF6/fo01a9YgMzMTV65cQc+ePbFv3z4AwLt379C7d2/k5eUBADQ1NdGrVy8YGxtj/fr1xcqkipiYGEycOFH4rKOjg2+//RYODg5ITU3FH3/8AQAwMTFBVFQUTp06hejoaCF+4WGN9erVAwCsW7dONHy4atWqaNeuHR48eIA1a9YgPz8f9+/fR/v27XH58mVoaBS/RCirt3/KrFmzioUZGhri+++/V2l9IyMjNGvWDG5ubjA2NoZUKsXjx48RGxuL9PR0vHr1CiNHjhSGpxERevXqhTdv3ghpdOnSBc7OztiyZQv279+vMJ+ibUJDQwPh4eEwMTHB2rVrcfz4cYXrvXr1CsHBwcjMzAQAODg4oHPnzpDJZNi6dSsuX76Mly9fokOHDrhx4wYMDQ0BAD/++KNosqfWrVvD09MTu3fvFg3DKwsLCwu0bt0aFSpUgImJCdTV1XH//n1ER0fj6dOnuH//PqZOnYolS5YAALZt24YnT54AAIyNjREeHg5TU1M8ePAAV69eFZ2PPgYR4dGjR6K2LZPJ0KpVK5XWX7NmDd69ewcAsLOzQ/fu3aGrq4t79+7h0qVLOHHiRInrFxQUoFevXti8eTMAwMDAAPHx8fD19S017/DwcKxbtw4AkJCQgIyMDFhYWAA
      "text/plain": [
       "<Figure size 900x800 with 4 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot confusion matrix for claim scenario\n",
    "plot_confusion_matrix_from_df(summary_df_kbest, 'RISK_VS_CLAIM using KBest Features from all features')\n",
    "plot_confusion_matrix_from_df(summary_df_rfe, 'RISK_VS_CLAIM using RFE Features from all features')\n",
    "plot_confusion_matrix_from_df(summary_df_lasso, 'RISK_VS_CLAIM using Lasso Features from all features')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "30786f7c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAABeMAAAFICAYAAADTdeWXAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQAAaXhJREFUeJzt3Xd8FNX+//F3IJ3QIST0XkRAiiCgIkXKRaVdQQUERRBukKIioCJY6SJFERApolKkyEVE6dJLSKgxBAhwhQQE6YEEk/P7g1/2y5pCEjJsJryej8c+vMycnT1n33fmzH4yO+tmjDECAAAAAAAAAACWyeHqDgAAAAAAAAAAkN1RjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALEYxHgAAAAAAAAAAi1GMBwAAAAAAAADAYhTjAQAAAAAAAACwGMV4AAAAAAAAAAAsRjEeAAAAAAAAAACLUYwHAAAAAAAAAMBiFOMBAAAAAAAAALAYxXgAAAAAAAAAACxGMR4AAAAAAAAAAItRjAcAAAAAAAAAwGIU4wEAAAAAAAAAsBjFeAAAAAAAAAAALOae1oYnT57UuXPnrOwLMkFsbKy8vLxc3Q2kAVnZAznZAznZAznZB1nZAznZAznZB1nZAznZAznZAznZB1nZQ6FChVSyZMlU26SpGH/y5ElVqVJFMTExmdIxWCdnzpyKj493dTeQBmRlD+RkD+RkD+RkH2RlD+RkD+RkH2RlD+RkD+RkD+RkH2RlD76+vgoLC0u1IJ+mYvy5c+cUExOjefPmqUqVKpnWQWSulStXatiwYeRkA2RlD+RkD+RkD+RkH2RlD+RkD+RkH2RlD+RkD+RkD+RkH2RlD2FhYerSpYvOnTt398X4RFWqVFGtWrXuunOwRlhYmCRysgOysgdysgdysgdysg+ysgdysgdysg+ysgdysgdysgdysg+yyl74AVcAAAAAAAAAACxGMR4AAAAAAAAAAItRjP+HDRs2yM3NTRcvXkzzc0qXLq3PPvvMsj4BAAAAAHAvuLm5admyZZKk48ePy83NTaGhoS7tU3Z2+/udmW0BAFmT7Yrx3bt3l5ubm3r37p1kXVBQkNzc3NS9e/d73zFkmJubW6qPESNGOE4CEx8FCxZU8+bNFRIS4uru3xfSk5G/v7+uXLni9PyHHnpII0aMcE3n71OJx8p/Po4cOeK0ztPTU+XLl9cHH3ygv//+29Xdvq+kJaNRo0Y5PWfZsmVyc3NzUY/txaq5Zfbs2Y72OXLkUGBgoDp16qSTJ0+mq38jRozQQw89dJejtDer5xayynypHbf+uT6t80vp0qUdz/H19VW1atX01VdfpbtvFKhuSWtGGZlfyOreuD1DDw8PlSlTRm+99ZZu3Ljh6q7dF1xxnhwVFaVWrVplelukzd18bkq8mDPxUbhwYf3rX//S/v37XTyq+0Nq2f322296+umnVbRoUeYdF0stp5EjR+rhhx9W7ty55e/vr7Zt2yo8PNzVXbac7YrxklSiRAnNnz9f169fdyy7ceOGvvvuu1R/rRZZU1RUlOPx2WefKU+ePE7L3nzzTUfbNWvWKCoqSr/88ouuXr2qVq1apetbDMiY9GR05coVjRs3zoW9RaKWLVs65RQVFaUyZco4rYuIiNAbb7yhESNGaOzYsS7u8f0ntYy8vb01evRoXbhwwcW9tCcr55bEbZ06dUqLFy9WeHi4nn322XswquzlXswtZJX5Ujtu3b4+PfPLBx98oKioKB04cEBdunRRz5499fPPP1s9lGzrThndzfxCVvdGYobHjh3ThAkTNG3aNA0fPtzV3bpvpPU4FhcXlymvFxAQIC8vr0xvi7S7289N4eHhjnPJ2NhYtW7dOtP+/4HUpZTdtWvXVKNGDX3++eeu7iKUck4bN25UUFCQtm/frtWrV+vmzZtq3ry5rl275uouW8qWxfhatWqpRIkSWrJkiWPZkiVLVLJkSdWsWdOxLDY2Vv369ZO/v7+8vb316KOPateuXU7bWrlypSpWrCgfHx81btxYx48fT/J6mzdv1mOPPSYfHx+VKFFC/fr1y/b/x7iXAgICHI+8efPKzc3NaZmfn5+jbcGCBRUQEKA6depo3LhxOnPmjHbs2OHC3t8f0pPRa6+9pk8//VRnz551YY8hSV5eXk45BQQEKGfOnE7rSpUqpT59+qhZs2Zavny5i3t8/0kto2bNmikgIEAjR450cS/tycq5JXFbgYGBatCggXr06KGdO3fq8uXLjjaDBw9WxYoV5evrq7Jly2rYsGG6efOmpFtXbL///vvau3ev48qQ2bNnS5IuXryoV155RYULF1aePHnUpEkT7d2715o3ycXuxdxCVpkvtePW7evTM7/kzp1bAQEBKlu2rAYPHqwCBQpo9erVjvW7du3Sk08+qUKFCilv3rxq1KiR9uzZ41hfunRpSVK7du3k5ubm+Lck/fjjj6pVq5a8vb1VtmxZvf/++9n+m2B3yuhu5heyujcSMyxRooTatm2rZs2aOd7nhIQEjRw5UmXKlJGPj49q1KihH374wen5Bw8e1FN
      "text/plain": [
       "<Figure size 1600x400 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Print a table to summarize the results\n",
    "summary_table = pd.concat([summary_df_kbest, summary_df_rfe, summary_df_lasso], ignore_index=True)\n",
    "summary_table = summary_table[['title', 'count_true_positive', 'count_true_negative',\n",
    "                               'count_false_positive', 'count_false_negative', 'true_positive_score', 'true_negative_score',\n",
    "                               'false_positive_score', 'false_negative_score', 'recall_score', 'precision_score',\n",
    "                               'false_positive_rate_score', 'f1_score', 'f2_score']]\n",
    "\n",
    "# Rename them\n",
    "summary_table.columns = ['Model', 'TP', 'TN', 'FP', 'FN',\n",
    "                         'TP Rate', 'TN Rate', 'FP Rate', 'FN Rate',\n",
    "                         'Recall', 'Precision', 'FPR', 'F1', 'F2']\n",
    "                         \n",
    "# summary_table.to_csv('flagging_analysis_summary.csv', index=False)\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Set up figure and axis\n",
    "fig, ax = plt.subplots(figsize=(16, 4))  # Adjust width/height as needed\n",
    "ax.axis('off')  # Hide axes\n",
    "\n",
    "# Create table from DataFrame\n",
    "table = ax.table(cellText=summary_table.round(3).values,\n",
    "                 colLabels=summary_table.columns,\n",
    "                 loc='center',\n",
    "                 cellLoc='center')\n",
    "\n",
    "table.auto_set_font_size(False)\n",
    "table.set_fontsize(10)\n",
    "table.scale(1.2, 1.5)  # Adjust cell size\n",
    "\n",
    "# Save as image\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d731d0c5",
   "metadata": {},
   "source": [
    "### Interpreting the Classification Report\n",
    "\n",
    "The **Classification Report** provides key metrics to evaluate how well the model performed on each class.\n",
    "\n",
    "It includes the following metrics for each class (0 and 1):\n",
    "* Metric: Meaning\n",
    "* Precision: Out of all predicted positives, how many were actually positive?\n",
    "* Recall: Out of all actual positives, how many did we correctly identify?\n",
    "* F1-score: Harmonic mean of precision and recall (balances both)\n",
    "* Support: Number of true samples of that class in the test data\n",
    "\n",
    "Interpretation:\n",
    "* Class 0 = No incident\n",
    "* Class 1 = Has resolution incident (rare, but important!)\n",
    "\n",
    "A few explanatory cases:\n",
    "* A high recall for class 1 means we're catching most incidents.\n",
    "* A high precision for class 1 means when we predict an incident, we're often correct.\n",
    "* The F1-score gives a single balanced measure (good for imbalanced data).\n",
    "\n",
    "Special note for imbalanced data:\n",
    "Since class 1 (or just True) is rare (1% in our case), metrics for that class are more critical.\n",
    "We want to maximize recall to catch as many real incidents as possible — without letting precision drop too low (to avoid too many false alarms)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c366cfe7",
   "metadata": {},
   "source": [
    "### Results Summary\n",
    "\n",
    "- Model 1 (Kbest) best in F1 Score (0.227), but has a moderate recall.\n",
    "- Model 2 (RFE) provides the highest recall (0.875) and the best F2 score (0.345), meaning it's most effective at capturing positives while tolerating more false positives.\n",
    "- Model 3 (Lasso) offers the highest precision (0.9) and the lowest FPR, though it misses most real incidents (low recall)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "4b4da914",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhgAAAHWCAYAAAA1jvBJAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQAAfLpJREFUeJzt3XdcU1f/B/BPAoQ9RESGKILg3qvuhaLWvUCtom3t0j59apd2qR3aX22tfVpbW611VAFxKy60al1Vq+IWB+ICVOpAZkJyfn+kBCmgBG+4CXzerxcvTk7u+OYQyJd7zj1HIYQQICIiIpKQUu4AiIiIqOJhgkFERESSY4JBREREkmOCQURERJJjgkFERESSY4JBREREkmOCQURERJJjgkFERESSY4JBREREkmOCQURERJJjgkFUCSxevBgKhcLwZW1tDV9fX4wbNw43b94sdh8hBJYtW4bOnTvDzc0NDg4OaNy4MT755BNkZmaWeK61a9eiT58+8PDwgEqlgo+PD0aMGIHff/+9VLHm5OTgm2++Qdu2beHq6go7OzsEBwdj0qRJuHDhQplePxGVPwXXIiGq+BYvXozx48fjk08+Qe3atZGTk4M///wTixcvhr+/P06fPg07OzvD9lqtFqNGjcLKlSvRqVMnDBkyBA4ODti7dy9WrFiBBg0aYMeOHahevbphHyEEnn/+eSxevBjNmzfHsGHD4OXlhZSUFKxduxZHjx7F/v370b59+xLjTEtLQ+/evXH06FH069cPISEhcHJyQkJCAqKiopCamgq1Wm3StiIiiQgiqvB+/fVXAUAcOXKkUP17770nAIjo6OhC9TNnzhQAxNtvv13kWBs2bBBKpVL07t27UP3s2bMFAPHf//5X6HS6IvstXbpUHDp06LFxPvvss0KpVIpVq1YVeS4nJ0e89dZbj92/tDQajcjNzZXkWERUPCYYRJVASQnGpk2bBAAxc+ZMQ11WVpaoUqWKCA4OFhqNptjjjR8/XgAQBw8eNOzj7u4u6tWrJ/Ly8soU459//ikAiAkTJpRq+y5duoguXboUqY+IiBC1atUyPL5y5YoAIGbPni2++eYbERAQIJRKpfjzzz+FlZWVmD59epFjnD9/XgAQ3333naHu3r174o033hA1atQQKpVKBAYGii+++EJotVqjXytRZcAxGESVWFJSEgCgSpUqhrp9+/bh3r17GDVqFKytrYvdb+zYsQCATZs2Gfa5e/cuRo0aBSsrqzLFsmHDBgDAmDFjyrT/k/z666/47rvv8NJLL+Hrr7+Gt7c3unTpgpUrVxbZNjo6GlZWVhg+fDgAICsrC126dMFvv/2GsWPH4n//+x86dOiAqVOnYvLkySaJl8jSFf/Xg4gqpAcPHiAtLQ05OTk4dOgQZsyYAVtbW/Tr18+wzdmzZwEATZs2LfE4+c+dO3eu0PfGjRuXOTYpjvE4N27cwKVLl1CtWjVDXVhYGF5++WWcPn0ajRo1MtRHR0ejS5cuhjEmc+bMweXLl3H8+HEEBQUBAF5++WX4+Phg9uzZeOutt+Dn52eSuIksFa9gEFUiISEhqFatGvz8/DBs2DA4Ojpiw4YNqFGjhmGbhw8fAgCcnZ1LPE7+c+np6YW+P26fJ5HiGI8zdOjQQskFAAwZMgTW1taIjo421J0+fRpnz55FWFiYoS4mJgadOnVClSpVkJaWZvgKCQmBVqvFH3/8YZKYiSwZr2AQVSLz5s1DcHAwHjx4gEWLFuGPP/6Ara1toW3yP+DzE43i/DsJcXFxeeI+T/LoMdzc3Mp8nJLUrl27SJ2Hhwd69OiBlStX4tNPPwWgv3phbW2NIUOGGLa7ePEiTp48WSRByXf79m3J4yWydEwwiCqRNm3aoFWrVgCAQYMGoWPHjhg1ahQSEhLg5OQEAKhfvz4A4OTJkxg0aFCxxzl58iQAoEGDBgCAevXqAQBOnTpV4j5P8ugxOnXq9MTtFQoFRDF32Wu12mK3t7e3L7Y+PDwc48ePR3x8PJo1a4aVK1eiR48e8PDwMGyj0+nQs2dPvPvuu8UeIzg4+InxElU27CIhqqSsrKwwa9YsJCcn4/vvvzfUd+zYEW5ublixYkWJH9ZLly4FAMPYjY4dO6JKlSqIjIwscZ8n6d+/PwDgt99+K9X2VapUwf3794vUX7161ajzDho0CCqVCtHR0YiPj8eFCxcQHh5eaJvAwEBkZGQgJCSk2K+aNWsadU6iyoAJBlEl1rVrV7Rp0wZz585FTk4OAMDBwQFvv/02EhIS8MEHHxTZJzY2FosXL0ZoaCieeeYZwz7vvfcezp07h/fee6/YKwu//fYbDh8+XGIs7dq1Q+/evbFw4UKsW7euyPNqtRpvv/224XFgYCDOnz+PO3fuGOpOnDiB/fv3l/r1A4CbmxtCQ0OxcuVKREVFQaVSFbkKM2LECBw8eBDbtm0rsv/9+/eRl5dn1DmJKgPO5ElUCeTP5HnkyBFDF0m+VatWYfjw4fjxxx/xyiuvANB3M4SFhWH16tXo3Lkzhg4dCnt7e+zbtw+//fYb6tevj507dxaayVOn02HcuHFYtmwZWrRoYZjJMzU1FevWrcPhw4dx4MABtGvXrsQ479y5g169euHEiRPo378/evToAUdHR1y8eBFRUVFISUlBbm4uAP1dJ40aNULTpk3xwgsv4Pbt25g/fz6qV6+O9PR0wy24SUlJqF27NmbPnl0oQXnU8uXL8dxzz8HZ2Rldu3Y13DKbLysrC506dcLJkycxbtw4tGzZEpmZmTh16hRWrVqFpKSkQl0qRATO5ElUGZQ00ZYQQmi1WhEYGCgCAwMLTZKl1WrFr7/+Kjp06CBcXFyEnZ2daNiwoZgxY4bIyMgo8VyrVq0SvXr1Eu7u7sLa2lp4e3uLsLAwsXv37lLFmpWVJb766ivRunVr4eTkJFQqlQgKChKvv/66uHTpUqFtf/vtNxEQECBUKpVo1qyZ2LZt22Mn2ipJenq6sLe3FwDEb7/9Vuw2Dx8+FFOnThV16tQRKpVKeHh4iPbt24uvvvpKqNXqUr02osqEVzCIiIhIchyDQURERJJjgkFERESSY4JBREREkmOCQURERJJjgkFERESSY4JBREREkqt0a5HodDokJyfD2dkZCoVC7nCIiIgshhACDx8+hI+PD5TKx1+jqHQJRnJyMvz8/OQOg4iIyGJdv34dNWrUeOw2lS7ByF9e+vr164bloZ+WRqPB9u3b0atXL9jY2EhyzMqObSo9tqm02J7SY5tKyxTtmZ6eDj8/P8Nn6eNUugQjv1vExcVF0gTDwcEBLi4u/KWQCNtUemxTabE9pcc2lZYp27M0Qww4yJOIiIgkxwSDiIiIJMcEg4iIiCTHBIOIiIgkxwSDiIiIJMcEg4iIiCTHBIOIiIgkxwSDiIiIJMcEg4iIiCTHBIOIiIgkJ2uC8ccff6B///7w8fGBQqHAunXrnrjP7t270aJFC9ja2qJOnTpYvHixyeMkIiIi48iaYGRmZqJp06aYN29eqba/cuUKnn32WXTr1g3x8fH473//ixdffBHbtm0zcaRERERkDFkXO+vTpw/69OlT6u3nz5+P2rVr4+uvvwYA1K9fH/v27cM333yD0NBQU4VJRERkNpKTgXv3nrydRgNcu+aMW7eAJ6ysbhIWtZrqwYMHERISUqguNDQU//3vf0vcJzc3F7m5uYbH6enpAPSrzGk0Gkniyj+OVMcjtqkpsE2lxfaUHtv08a5dA956ywrr1z+588HKSgt//6u4fLk7kpI0+OILaT/vSsOiEozU1FRUr169UF316tWRnp6O7Oxs2NvbF9ln1qxZmDFjRpH67du3w8HBQdL44uLiJD0esU1NgW0qLban9NimhWk0CqxfXwcrVwZDrX5ycuHgkIURI1aiVq2riIwciaQkBTZvPitJLFlZWaXe1qISjLKYOnUqJk+ebHicnp4OPz8/9OrVCy4uLpKcQ6PRIC4uDj179oSNjY0kx6zs2KbSY5tKi+0pPbZpUTt3KvDuu1a4cEFhqKteXaBPHwGFouj2Nja34ekZDWvr+9DpbNG06R0MGtQ
      "text/plain": [
       "<Figure size 600x500 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# ROC Curve\n",
    "fpr, tpr, _ = roc_curve(y_test_rfe, y_pred_proba_rfe)\n",
    "roc_auc = auc(fpr, tpr)\n",
    "\n",
    "plt.figure(figsize=(6, 5))\n",
    "plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (AUC = {roc_auc:.4f})')\n",
    "plt.plot([0, 1], [0, 1], color='gray', linestyle='--')\n",
    "plt.xlabel('False Positive Rate')\n",
    "plt.ylabel('True Positive Rate')\n",
    "plt.title('ROC Curve')\n",
    "plt.legend(loc='lower right')\n",
    "plt.grid(True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e403edb1",
   "metadata": {},
   "source": [
    "### Interpreting the ROC Curve\n",
    "\n",
    "The **Receiver Operating Characteristic (ROC) curve** shows how well the model distinguishes between the positive and negative classes across all decision thresholds.\n",
    "\n",
    "A quick reminder of the definitions:\n",
    "* True Positive Rate (TPR) = Recall\n",
    "* False Positive Rate (FPR) = Proportion of negatives wrongly classified as positives\n",
    "\n",
    "What we display in this plot is:\n",
    "* The x-axis is False Positive Rate\n",
    "* The y-axis is True Positive Rate\n",
    "\n",
    "The curve shows how TPR and FPR change as the threshold varies\n",
    "\n",
    "It's important to note that:\n",
    "* A model with no skill will produce a diagonal line (AUC = 0.5)\n",
    "* A model with perfect discrimination will hug the top-left corner (AUC = 1.0)\n",
    "\n",
    "The Area Under the Curve (ROC AUC) gives a single performance score:\n",
    "* Closer to 1 means better at ranking positive cases higher than negative ones\n",
    "\n",
    "**Important!**\n",
    "\n",
    "While useful, the ROC curve can sometimes overestimate performance when the dataset is imbalanced, because it includes negatives (which dominate in our case, around 99%!). That’s why we also MUST check the Precision-Recall curve."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "6790d41d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhgAAAHWCAYAAAA1jvBJAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQAAU9hJREFUeJzt3XlYlOX+BvB7ZhgGkE1FQBDFNXPFUAnNHUUpy46puWu5S6lkpaaiWaKmpplKedzOLw3TzEwRJdTcKJfAU7lvaSqIGvsyw8zz+8PD5DiDAj4woPfnurhknnne9/3OF5CbdxuFEEKAiIiISCKltQsgIiKiJw8DBhEREUnHgEFERETSMWAQERGRdAwYREREJB0DBhEREUnHgEFERETSMWAQERGRdAwYREREJB0DBlEFNWzYMPj6+hZrmf3790OhUGD//v2lUlNF17FjR3Ts2NH4+MqVK1AoFFi3bp3VaiKqqBgwiIpo3bp1UCgUxg87Ozs0aNAAoaGhSE5OtnZ55V7BL+uCD6VSiSpVqqBHjx6Ij4+3dnlSJCcnY/LkyWjYsCEcHBxQqVIl+Pv746OPPkJqaqq1yyMqUzbWLoCoovnwww9Ru3Zt5Obm4tChQ1i5ciWio6Px+++/w8HBoczqWLVqFQwGQ7GWad++PXJycmBra1tKVT1a//79ERISAr1ej3PnzmHFihXo1KkTjh07hqZNm1qtrsd17NgxhISEIDMzE4MGDYK/vz8A4Pjx45g3bx4OHDiAPXv2WLlKorLDgEFUTD169EDLli0BACNGjEDVqlWxePFifP/99+jfv7/FZbKyslCpUiWpdajV6mIvo1QqYWdnJ7WO4nruuecwaNAg4+N27dqhR48eWLlyJVasWGHFykouNTUVr776KlQqFRISEtCwYUOT5z/++GOsWrVKyrZK43uJqDTwEAnRY+rcuTMA4PLlywDunRvh6OiIixcvIiQkBE5OThg4cCAAwGAwYMmSJWjcuDHs7Ozg4eGB0aNH4++//zZb765du9ChQwc4OTnB2dkZrVq1wsaNG43PWzoHIyoqCv7+/sZlmjZtiqVLlxqfL+wcjM2bN8Pf3x/29vZwc3PDoEGDcP36dZM5Ba/r+vXr6NWrFxwdHVGtWjVMnjwZer2+xP1r164dAODixYsm46mpqZg4cSJ8fHyg0WhQr149zJ8/32yvjcFgwNKlS9G0aVPY2dmhWrVq6N69O44fP26cs3btWnTu3Bnu7u7QaDRo1KgRVq5cWeKaH/TFF1/g+vXrWLx4sVm4AAAPDw9Mnz7d+FihUGDWrFlm83x9fTFs2DDj44LDcj/99BPGjRsHd3d31KhRA1u2bDGOW6pFoVDg999/N46dOXMGr732GqpUqQI7Ozu0bNkS27dvf7wXTfQI3INB9JgKfjFWrVrVOJafn4/g4GC88MILWLhwofHQyejRo7Fu3ToMHz4cb7/9Ni5fvozPP/8cCQkJOHz4sHGvxLp16/DGG2+gcePGmDp1KlxdXZGQkICYmBgMGDDAYh2xsbHo378/unTpgvnz5wMATp8+jcOHD2PChAmF1l9QT6tWrRAREYHk5GQsXboUhw8fRkJCAlxdXY1z9Xo9goODERAQgIULF+LHH3/EokWLULduXYwdO7ZE/bty5QoAoHLlysax7OxsdOjQAdevX8fo0aNRs2ZNHDlyBFOnTsXNmzexZMkS49w333wT69atQ48ePTBixAjk5+fj4MGD+Pnnn417mlauXInGjRvj5Zdfho2NDX744QeMGzcOBoMB48ePL1Hd99u+fTvs7e3x2muvPfa6LBk3bhyqVauGmTNnIisrCy+++CIcHR3xzTffoEOHDiZzN23ahMaNG6NJkyYAgD/++ANt27aFt7c3pkyZgkqVKuGbb75Br1698O233+LVV18tlZqJIIioSNauXSsAiB9//FGkpKSIa9euiaioKFG1alVhb28v/vrrLyGEEEOHDhUAxJQpU0yWP3jwoAAgNmzYYDIeExNjMp6amiqcnJxEQECAyMnJMZlrMBiMnw8dOlTUqlXL+HjChAnC2dlZ5OfnF/oa9u3bJwCIffv2CSGE0Gq1wt3dXTRp0sRkWzt27BAAxMyZM022B0B8+OGHJuts0aKF8Pf3L3SbBS5fviwAiNmzZ4uUlBSRlJQkDh48KFq1aiUAiM2bNxvnzpkzR1SqVEmcO3fOZB1TpkwRKpVKXL16VQghxN69ewUA8fbbb5tt7/5eZWdnmz0fHBws6tSpYzLWoUMH0aFDB7Oa165d+9DXVrlyZdG8efOHzrkfABEeHm42XqtWLTF06FDj44LvuRdeeMHs69q/f3/h7u5uMn7z5k2hVCpNvkZdunQRTZs2Fbm5ucYxg8Eg2rRpI+rXr1/kmomKi4dIiIopKCgI1apVg4+PD15//XU4Ojriu+++g7e3t8m8B/+i37x5M1xcXNC1a1fcvn3b+OHv7w9HR0fs27cPwL09ERkZGZgyZYrZ+RIKhaLQulxdXZGVlYXY2Ngiv5bjx4/j1q1bGDdunMm2XnzxRTRs2BA7d+40W2bMmDEmj9u1a4dLly4VeZvh4eGoVq0aPD090a5dO5w+fRqLFi0y+et/8+bNaNeuHSpXrmzSq6CgIOj1ehw4cAAA8O2330KhUCA8PNxsO/f3yt7e3vh5Wloabt++jQ4dOuDSpUtIS0srcu2FSU9Ph5OT02OvpzAjR46ESqUyGevXrx9u3bplcrhry5YtMBgM6NevHwDg7t272Lt3L/r27YuMjAxjH+/cuYPg4GCcP3/e7FAYkSw8REJUTMuXL0eDBg1gY2MDDw8PPPPMM1AqTbO6jY0NatSoYTJ2/vx5pKWlwd3d3eJ6b926BeCfQy4Fu7iLaty4cfjmm2/Qo0cPeHt7o1u3bujbty+6d+9e6DJ//vknAOCZZ54xe65hw4Y4dOiQyVjBOQ73q1y5ssk5JCkpKSbnZDg6OsLR0dH4eNSoUejTpw9yc3Oxd+9efPbZZ2bncJw/fx7//e9/zbZV4P5eeXl5oUqVKoW+RgA4fPgwwsPDER8fj+zsbJPn0tLS4OLi8tDlH8XZ2RkZGRmPtY6HqV27ttlY9+7d4eLigk2bNqFLly4A7h0e8fPzQ4MGDQAAFy5cgBACM2bMwIwZMyyu+9atW2bhmEgGBgyiYmrdurXx2H5hNBqNWegwGAxwd3fHhg0bLC5T2C/TonJ3d0diYiJ2796NXbt2YdeuXVi7di2GDBmC9evXP9a6Czz4V7QlrVq1MgYX4N4ei/tPaKxfvz6CgoIAAC+99BJUKhWmTJmCTp06GftqMBjQtWtXvPfeexa3UfALtCguXryILl26oGHDhli8eDF8fHxga2uL6OhofPrpp8W+1NeShg0bIjExEVqt9rEuAS7sZNn798AU0Gg06NWrF7777jusWLECycnJOHz4MObOnWucU/DaJk+ejODgYIvrrlevXonrJXoYBgyiMlK3bl38+OOPaNu2rcVfGPfPA4Dff/+92P/529raomfPnujZsycMBgPGjRuHL774AjNmzLC4rlq1agEAzp49a7wapsDZs2eNzxfHhg0bkJOTY3xcp06dh87/4IMPsGrVKkyfPh0xMTEA7vUgMzPTGEQKU7duXezevRt3794tdC/GDz/8gLy8PGzfvh01a9Y0jhcckpKhZ8+eiI+Px7ffflvopcr3q1y5stmNt7RaLW7evFms7fbr1w/r169HXFwcTp8+DSGE8fAI8E/v1Wr1I3tJJBvPwSAqI3379oVer8ecOXPMnsvPzzf+wunWrRucnJwQERGB3Nxck3lCiELXf+fOHZPHSqUSzZo1AwDk5eVZXKZly5Zwd3dHZGSkyZxdu3bh9OnTePHFF4v02u7Xtm1bBAUFGT8eFTBcXV0xevRo7N69G4mJiQDu9So+Ph67d+82m5+amor8/HwAQO/evSGEwOzZs83mFfSqYK/L/b1LS0vD2rVri/3aCjNmzBhUr14d77zzDs6dO2f2/K1bt/DRRx8ZH9etW9d4HkmBL7/8stiX+wYFBaFKlSrYtGkTNm3ahNatW5scTnF3d0f
      "text/plain": [
       "<Figure size 600x500 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# PR Curve\n",
    "precision, recall, _ = precision_recall_curve(y_test_rfe, y_pred_proba_rfe)\n",
    "pr_auc = average_precision_score(y_test_rfe, y_pred_proba_rfe)\n",
    "\n",
    "plt.figure(figsize=(6, 5))\n",
    "plt.plot(recall, precision, color='green', lw=2, label=f'PR curve (AUC = {pr_auc:.4f})')\n",
    "plt.xlabel('Recall')\n",
    "plt.ylabel('Precision')\n",
    "plt.title('Precision-Recall Curve')\n",
    "plt.legend(loc='lower left')\n",
    "plt.grid(True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c111a266",
   "metadata": {},
   "source": [
    "### Interpreting the Precision-Recall (PR) Curve\n",
    "\n",
    "The **Precision-Recall (PR) curve** helps evaluate model performance, especially on imbalanced datasets like ours (where positive cases are rare).\n",
    "\n",
    "A quick reminder of the definitions:\n",
    "* Precision = How many of the predicted positives are actually positive\n",
    "* Recall = How many of the actual positives the model correctly identifies\n",
    "\n",
    "What we display in this plot is:\n",
    "* The x-axis is Recall \n",
    "* The y-axis is Precision \n",
    "\n",
    "The curve shows the trade-off between them at different model thresholds\n",
    "\n",
    "In imbalanced datasets, accuracy can be misleading — the PR curve focuses only on the positive class, making it much more meaningful:\n",
    "* A higher curve means better performance\n",
    "* The area under the curve (PR AUC) summarizes this: closer to 1 is better"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c83ddcd",
   "metadata": {},
   "source": [
    "## Feature Importance\n",
    "Understanding what drives the prediction is useful for future experiments and business knowledge. Here we track both the native feature importances of the trees, as well as a more heavy SHAP values analysis.\n",
    "\n",
    "Important! Be aware that SHAP analysis might take quite a bit of time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "d66ffe2c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAxgAAAHqCAYAAACHuOhfAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdUFdf++P33AaRIB1FQkaKASMCGJkoEC17sXdRwBUSMxhArtm8izRoj9lgSE0CDJcYaW6LYsWEBURFRQcyV2MVgAYR5/uBhfh4pHpSoJPu11lnLM2XPZ8/Mwdmzm0KSJAlBEARBEARBEIRKoPauAxAEQRAEQRAE4Z9DFDAEQRAEQRAEQag0ooAhCIIgCIIgCEKlEQUMQRAEQRAEQRAqjShgCIIgCIIgCIJQaUQBQxAEQRAEQRCESiMKGIIgCIIgCIIgVBpRwBAEQRAEQRAEodKIAoYgCIIgCIIgCJVGFDAEQRAEQRAEQag0ooAhCIIgCIIsOjoahUJR6mfy5Ml/yzGPHj1KWFgYDx8+/FvSfxPF5+PUqVPvOpTXtnTpUqKjo991GMK/iMa7DkAQBEEQhPdPREQENjY2Sss++OCDv+VYR48eJTw8HH9/f4yMjP6WY/ybLV26lBo1auDv7/+uQxH+JUQBQxAEQRCEEjp37oyrq+u7DuONPH78GF1d3Xcdxjvz5MkTqlev/q7DEP6FRBMpQRAEQRAqbNeuXbRp0wZdXV309fXp2rUrFy5cUNrm3Llz+Pv7Y2tri7a2Nubm5gQEBHDv3j15m7CwMCZMmACAjY2N3BwrIyODjIwMFApFqc17FAoFYWFhSukoFAouXrzIJ598grGxMR9//LG8/qeffqJ58+bo6OhgYmLCwIEDuXHjxmvl3d/fHz09PTIzM+nWrRt6enrUqVOHb7/9FoDk5GTat2+Prq4uVlZWrFmzRmn/4mZXhw4dYvjw4ZiammJgYICvry8PHjwocbylS5fi5OSElpYWtWvX5vPPPy/RnKxt27Z88MEHnD59Gnd3d6pXr87//d//YW1tzYULFzh48KB8btu2bQvA/fv3CQ4OxtnZGT09PQwMDOjcuTNJSUlKaR84cACFQsHPP//MjBkzqFu3Ltra2nTo0IErV66UiPfEiRN06dIFY2NjdHV1cXFxYeHChUrbXLp0iX79+mFiYoK2tjaurq5s27ZNaZv8/HzCw8Oxs7NDW1sbU1NTPv74Y/bs2aPSdRLeHVGDIQiCIAhCCdnZ2dy9e1dpWY0aNQBYvXo1fn5+eHl58fXXX/PkyROWLVvGxx9/zNmzZ7G2tgZgz549XLt2jSFDhmBubs6FCxf47rvvuHDhAsePH0ehUNCnTx8uX77M2rVrmT9/vnwMMzMz7ty5U+G4+/fvj52dHTNnzkSSJABmzJjB1KlT8fb2JjAwkDt37rB48WLc3d05e/bsazXLKigooHPnzri7uzNnzhxiY2MJCgpCV1eXL7/8Eh8fH/r06cPy5cvx9fWlVatWJZqcBQUFYWRkRFhYGKmpqSxbtozr16/LD/RQVHAKDw/H09OTzz77TN4uISGB+Ph4qlWrJqd37949OnfuzMCBA/nvf/9LrVq1aNu2LV988QV6enp8+eWXANSqVQuAa9eusWXLFvr374+NjQ23bt1ixYoVeHh4cPHiRWrXrq0U7+zZs1FTUyM4OJjs7GzmzJmDj48PJ06ckLfZs2cP3bp1w8LCgtGjR2Nubk5KSgrbt29n9OjRAFy4cAE3Nzfq1KnD5MmT0dXV5eeff6ZXr15s3LiR3r17y3mfNWsWgYGBtGzZkkePHnHq1CnOnDlDx44dK3zNhLdIEgRBEARB+P9FRUVJQKkfSZKkv/76SzIyMpKGDRumtN+ff/4pGRoaKi1/8uRJifTXrl0rAdKhQ4fkZd98840ESOnp6UrbpqenS4AUFRVVIh1ACg0Nlb+HhoZKgDRo0CCl7TIyMiR1dXVpxowZSsuTk5MlDQ2NEsvLOh8JCQnyMj8/PwmQZs6cKS978OCBpKOjIykUCmndunXy8kuXLpWItTjN5s2bS3l5efLyOXPmSIC0detWSZIk6fbt25Kmpqb0n//8RyooKJC3W7JkiQRIP/74o7zMw8NDAqTly5eXyIOTk5Pk4eFRYvmzZ8+U0pWkonOupaUlRUREyMv2798vAZKjo6OUm5srL1+4cKEESMnJyZIkSdLz588lGxsbycrKSnrw4IFSuoWFhfK/O3ToIDk7O0vPnj1TWt+6dWvJzs5OXta4cWOpa9euJeIW3n+iiZQgCIIgCCV8++237NmzR+kDRW+oHz58yKBBg7h79678UVdX58MPP2T//v1yGjo6OvK/nz17xt27d/noo48AOHPmzN8S94gRI5S+b9q0icLCQry9vZXiNTc3x87OTineigoMDJT/bWRkhIODA7q6unh7e8vLHRwcMDIy4tq1ayX2//TTT5VqID777DM0NDTYuXMnAHv37iUvL48xY8agpvb/HtmGDRuGgYEBO3bsUEpPS0uLIUOGqBy/lpaWnG5BQQH37t1DT08PBweHUq/PkCFD0NTUlL+3adMGQM7b2bNnSU9PZ8yYMSVqhYprZO7fv8++ffvw9vbmr7/+kq/HvXv38PLyIi0tjf/9739A0Tm9cOECaWlpKudJeD+IJlKCIAiCIJTQsmXLUjt5Fz/stW/fvtT9DAwM5H/fv3+f8PBw1q1bx+3bt5W2y87OrsRo/5+XmyGlpaUhSRJ2dnalbv/iA35FaGtrY2ZmprTM0NCQunXryg/TLy4vrW/FyzHp6elhYWFBRkYGANevXweKCikv0tTUxNbWVl5frE6dOkoFgFcpLCxk4cKFLF26lPT0dAoKCuR1pqamJbavV6+e0ndjY2MAOW9Xr14Fyh9t7MqVK0iSxNSpU5k6dWqp29y+fZs6deoQERFBz549sbe354MPPqBTp04MHjwYFxcXlfMovBuigCEIgiAIgsoKCwuBon4Y5ubmJdZraPy/Rwtvb2+OHj3KhAkTaNKkCXp6ehQWFtKpUyc5nfK8/KBe7MUH4Ze9WGtSHK9CoWDXrl2oq6uX2F5PT++VcZSmtLTKWy79//1B/k4v5/1VZs6cydSpUwkICGDatGmYmJigpqbGmDFjSr0+lZG34nSDg4Px8vIqdZsGDRoA4O7uztWrV9m6dSu///47K1euZP78+Sxfvlyp9kh4/4gChiAIgiAIKqtfvz4ANWvWxNPTs8ztHjx4QFxcHOHh4YSEhMjLS2vuUlZBovgN+csjJr385v5V8UqShI2NDfb29irv9zakpaXRrl07+XtOTg5ZWVl06dIFACsrKwBSU1OxtbWVt8vLyyM9Pb3c8/+iss7vL7/8Qrt27fjhhx+Ulj98+FDubF8RxffG+fPny4ytOB/VqlVTKX4TExOGDBnCkCFDyMnJwd3dnbCwMFHAeM+JPhiCIAiCIKjMy8sLAwMDZs6cSX5+fon1xSM/Fb/tfvnt9oIFC0rsUzxXxcsFCQMDA2rUqMGhQ4eUli9dulTlePv06YO6ujrh4eElYpEkSWnI3Lftu+++UzqHy5Yt4/nz53Tu3BkAT09PNDU1WbRokVLsP/zwA9nZ2XTt2lWl4+jq6pY6S7q6unqJc7Jhwwa5D0RFNWvWDBsbGxYsWFDieMXHqVmzJm3btmXFihVkZWWVSOPFkcNevjZ6eno0aNCA3Nzc14pPeHtEDYYgCIIgCCozMDBg2bJlDB48mGbNmjFw4EDMzMzIzMxkx44duLm5sWTJEgwMDOQhXPPz86lTpw6///476enpJdJs3rw5AF9++SUDBw6kWrVqdO/eHV1dXQIDA5k9ezaBgYG4urpy6NAhLl++rHK89evXZ/r06UyZMoWMjAx69eqFvr4+6enpbN68mU8//ZTg4OBKOz8VkZeXR4cOHfD29iY1NZWlS5fy8ccf06NHD6BoqN4pU6YQHh5Op06d6NGjh7xdixYt+O9//6vScZo3b86yZcuYPn06DRo0oGbNmrRv355u3boRERHBkCFDaN26NcnJycTGxirVllSEmpoay5Yto3v37jRp0oQ
      "text/plain": [
       "<Figure size 800x500 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "## BUILT-IN\n",
    "\n",
    "# Get feature importances from the model\n",
    "importances = best_pipeline_rfe.named_steps['model'].feature_importances_\n",
    "\n",
    "# Create a Series and sort\n",
    "feat_series = pd.Series(importances, index=selected_features_rfe).sort_values(ascending=True)  # ascending=True for horizontal plot\n",
    "\n",
    "# Plot Feature Importances\n",
    "plt.figure(figsize=(8, 5))\n",
    "feat_series.plot(kind='barh', color='skyblue')\n",
    "plt.title('Feature Importances')\n",
    "plt.xlabel('Importance')\n",
    "plt.grid(axis='x')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3897f25c",
   "metadata": {},
   "source": [
    "### Interpreting the Feature Importance Plot\n",
    "The **feature importance plot** shows how much each feature contributes to the model’s overall decision-making.\n",
    "\n",
    "For tree-based models like Random Forest, importance is based on how often and how effectively a feature is used to split the data across all trees.\n",
    "A higher score means the feature plays a bigger role in improving prediction accuracy.\n",
    "\n",
    "In the graph you will see that:\n",
    "* Features are ranked from most to least important.\n",
    "* The values are relative and model-specific — not directly interpretable as weights or probabilities.\n",
    "\n",
    "This helps us identify which features the model relies on most when making predictions.\n",
    "\n",
    "**Important!**\n",
    "Unlike SHAP values, native importance doesn't show how a feature affects predictions — only how useful it is to the model overall. For deeper interpretability (e.g., direction and context), SHAP is better (but it takes more time to run)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "e2197cea",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "PermutationExplainer explainer: 6394it [13:25,  7.93it/s]                          \n",
      "/tmp/ipykernel_29610/4064815753.py:21: FutureWarning: The NumPy global RNG was seeded by calling `np.random.seed`. In a future version this function will no longer use the global RNG. Pass `rng` explicitly to opt-in to the new behaviour and silence this warning.\n",
      "  shap.summary_plot(shap_values.values, X_test_shap)\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAzsAAAOsCAYAAABtTKjUAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdYFGfXwOHfLk0EpNkQrNh7wfraEjsC9pLEElTsMYnRxDQ1ifmieY29IbFhSVREKfZYMBq7xmiMJSqKYkME6bDsfn/wsrouZUGaeO7r2kt25pmZM7Oz65x5yig0Go0GIYQQQgghhChmlIUdgBBCCCGEEELkB0l2hBBCCCGEEMWSJDtCCCGEEEKIYkmSHSGEEEIIIUSxJMmOEEIIIYQQoliSZEcIIYQQQghRLEmyI4QQQgghhCiWJNkRQgghhBBCFEuS7AghhBBCCCGKJUl2hBBCCCGEeAPMnDkTS0vLbOeFhoaiUCjw8/PL0fpzu1x+Mi7sAIQQQgghhBBFh4ODA8ePH6dmzZqFHcork2RHCCGEEEIIoWVmZkarVq0KO4w8Ic3YhBBCCCGEEFoZNUdLTk5m0qRJ2NnZYWNjw5gxY9i0aRMKhYLQ0FCd5RMTE5k4cSK2trY4ODgwZcoUVCpVAe9FGkl2hBBCCCGEeIOoVCq9l1qtznKZadOm4e3tzWeffcbmzZtRq9VMmzYtw7JffvklSqWSLVu2MHbsWH766Sd+/vnn/NiVbEkzNiGEEEIIId4QcXFxmJiYZDjPwsIiw+mRkZEsX76cr776is8++wyAbt260blzZ8LCwvTKt2zZkkWLFgHQpUsXDh06hJ+fH2PHjs2jvTCcJDtCCCGEECJPpaSksGbNGgA8PT0zvbgWBlL0Nbysxj/L2ebm5hw5ckRv+sqVK9m0aVOGy1y8eJHExEQ8PDx0pvfq1YsDBw7ole/atavO+7p163Lw4MHsIs8XkuwIIYQQQgjxhlAqlbi4uOhNDw4OznSZ+/fvA1CmTBmd6WXLls2wvI2Njc57U1NTEhMTcxhp3pA+O0IIIYQQQohMOTg4APD48WOd6Y8ePSqMcHJEkh0hhBBCCCGKNEUOXnmvfv36lChRgoCAAJ3pO3bsyJft5SVpxiaEEEIIIYTIlL29PePGjeP777+nRIkSNG7cmK1bt3Lt2jUgrWlcUVV0IxNCCCGEEEIUCbNnz2b06NH88MMPDBgwgJSUFO3Q09bW1oUcXeYUGo1GU9hBCCGEEEKI4kNGY8tjin6Gl9Vsy784XjJ06FCOHj3KrVu3CmybOSXN2IQQQgghhCjS8qcvTk6EhIRw7NgxmjVrhlqtJjg4mI0bNzJv3rzCDi1LkuwIIYQQQgghsmRpaUlwcDBz5swhISGBqlWrMm/ePD766KPCDi1LkuwIIYQQQgghstSsWTP++OOPwg4jxyTZEUIIIYQQokgr/GZsrysZjU0IIYQQQghRLEmyI4QQQgghhCiWJNkRQgghhBBCFEvSZ0cIIYQQQogiTfrs5JbU7AghhBBCCCGKJUl2hBBCCCGEEMWSJDtCCCGEEEKIYkmSHSGEEEIIIUSxJMmOEEIIIYQQoliS0diEEEIIIYQo0mQ0ttySmh0hhBBCCCFEsSTJjhBCCCGEEKJYkmRHCCGEEEIUSRv+VjFuv4q7z1SFHYp4TUmfHSGEEEIIUaTEJquxW6wmRZP2fsUFMFOqiP/YCKXiTey/8ibuc96Qmh0hhBBCCFGkdNr8PNFJl6QGh6WphROQeG1JsiOEEEIIIYqU0w8znv4osWDjEK8/SXaEEEIIIV4Xl0Kh0wwY8CM8iirsaPKNJvsibxhFDl7iRdJnRwghhBCiqLseDjUn6k7zOwHLvGBcj8KJqZDcjEql9moNKeq096bA5ZFKnG3lHr7QJ2eFEEIIIURRFpugn+ikG+9TsLHkg48OqFDOVaGYq6LVBhW3o7Pul+P88/NEByAZqL5KnWl58WaTZEcIIYQQoiibvDbr+UkpBRJGfui8WcXC88+brZ18AFV8cteI7cz94jx4gTRjyy1JdoQQQgghirItx7KeX3p4wcSRDw6E5d263P2lp4/QJ312hBBCCCGKsrhshiCLlSHKAB4kgO1CFVEpYKKAwN7Q3Vkudd90UrMjhBBCCFGUqQzoj6LOpExicvbJUjES9b8WfSka6LEd/n6sKtyARKGTZEcIka+CgoJwcXHhzJkzhR1KkSfHKm+5u7szevRonWmjR4/G3d29kCIq2s6cOYOLiwtBQUE606Oiopg+fTrdu3fHxcVF75iKfHbsH8PK/X1b971aDdXHgflgsHwXFH1h9NK8j+8VRMbnfyLSeB1YL0wb/EAxV4X1QhUxSa9j3x7ps5NbkuwIIYqtw4cP4+3tXdhh6Dhz5gze3t7ExMQUdigiB65evYq3tzfh4eG5Wj48PBwXFxfmzJmTaRl3d3cGDhyY2xANkpv9mD9/Pvv376dfv358++23jBgxIh8jFHo8/s+wcs2m6r7vOxtuvPRkTp8DsHx33sSVB6aE5P82VMCzF8ZveJYC9kukb8+bRBoyCiGKrcOHDxMcHMyYMWMKOxSts2fP4uPjg7u7O1ZWVjrzXF1d6dq1KyYmJoUUXfG3dOlSNJqcX+hcu3YNHx8fmjVrRoUKFfIhsoKR1X40bdqUY8eOYWyse2lw8uRJWrVqhZeXV0GGKtJFxhlWLkUNf9+BepXS3gdkUkM83gdKW8GAtnkT3ytY83fhbDdFcp03iiQ7QghRRBgZGWFkZFTYYRRrkkhmTqlUYmZmpjf9yZMnWFtbF0JEgtuPclY+4n81xl9tyLrcwHnAPOjSCKqUhlnvQVmb3ESYa6nqws04hu5UcewuqDSwpDN4VC/ql8TSPC23ivonK4QoJjQaDevXr8fPz49Hjx7h4ODAiBEjcHNz0ym3Y8cOtm7dSmhoKMbGxtSvXx8vLy8aN26sU+7o0aP4+vpy48YNEhMTsbGxoW7dukycOJHKlSszevRozp07B4CLi4t2uRkzZhjcZ+Px48ds2LCB06dPc//+fZKSknB0dKRnz54MHTpULzFJSUlh06ZN7N27l9u3b2NsbEylSpVwc3Nj0KBBzJw5k+DgYAA8PDy0y3l5eTFmzBiCgoL45ptvWLFiBS4uLhw7dowPP/yQKVOmMHjwYL34PD09CQsLY8+ePdq78Xfu3MHHx4dTp04RHR1NmTJl6Ny5M6NHj8bc3Nyg/U7n7e2Nj48Pmzdvxt/fn99++43Y2FiqV6/OhAkTaNGihU55FxcX3Nzc6NmzJ8uWLePatWtYW1szcOBA3n//fZ49e8aCBQv4/fffiY+Pp3nz5nz55ZeUKVNGu47o6Gh+/vlnjhw5wuPHjzE3N8fBwYGuXbsybNiwHMWfkdGjR3P//n2dfik3btxg5cqV/PXXX0RFRVGqVCmqVKnC0KFDadu2rfY4AIwdO1a7nJubGzNnznzlmLJz7tw5fv75Z/7++29UKhVVqlRhwIAB9O7dW6fcq+7HmTNnGDt2rPY78mL54OBg7bn78ccfM3/+fL788kv69OmjF+/AgQNJTk5m+/btKBRv+AVaair4HYfNx+DsTXjyDOKS8m97Hb/OWfn9F9L+9TnwfJq5CTSpCnUqQt2KMKwjlC6VZyGmm3a4cPvNbHihK1SvHZDW4C3NFy3h+3ZyiVxcyCcphCgQS5cuJSkpib59+2Jqaoqfnx8zZ87EyclJm8gsWrQIX19f6tWrx/jx44mPj2f79u2MGTOGn376ibZt05pdnD17lsmTJ+Ps7IynpyeWlpZERERw6tQpwsLCqFy5MiNGjECj0XD+/Hm+/fZbbRwNGzY0OObr169z6NAhOnbsiJOTEyqViuPHj7NkyRLu3bvHl19+qS2bkpLCxIkTOXv2LK1ataJHjx6Ympry77//cujQIQYNGkTfvn2Ji4vj0KFDTJ48GRsbGwBq1KiR4fZbtWqFvb09O3fu1Et27ty5w8WLFxk8eLA20fnnn38YO3YsVlZW9O3bl7Jly3Lt2jV+/fVXLly4wMqVK/WaKBlixowZKJVKhg0bRnx8PP7+/nzwwQc
      "text/plain": [
       "<Figure size 800x950 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "## SHAP VALUES\n",
    "\n",
    "# SHAP requires that all features passed to Explainer be numeric (floats/ints)\n",
    "X_test_shap = X_test_rfe.copy()\n",
    "X_test_shap = X_test_shap.astype(float)\n",
    "\n",
    "# Function that returns the probability of the positive class\n",
    "def model_predict(data):\n",
    "    return best_pipeline_rfe.predict_proba(data)[:, 1]\n",
    "\n",
    "# Ensure input to SHAP is numeric\n",
    "X_test_shap = X_test_rfe.astype(float)\n",
    "\n",
    "# Create SHAP explainer\n",
    "explainer = shap.Explainer(model_predict, X_test_shap)\n",
    "\n",
    "# Compute SHAP values\n",
    "shap_values = explainer(X_test_shap)\n",
    "\n",
    "# Plot summary\n",
    "shap.summary_plot(shap_values.values, X_test_shap)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9ae2701",
   "metadata": {},
   "source": [
    "### Interpreting the SHAP Summary Plot\n",
    "\n",
    "Each point on a row represents a SHAP value for a single prediction (row = feature).\n",
    "The x-axis shows how much the feature contributed to increasing or decreasing the prediction.\n",
    "* Right (positive SHAP value): pushes prediction toward the positive class (i.e., higher chance of incident).\n",
    "* Left (negative SHAP value): pushes prediction toward the negative class (i.e., lower chance of incident).\n",
    "\n",
    "Color shows the actual feature value for that point:\n",
    "* Red = high value\n",
    "* Blue = low value\n",
    "\n",
    "In other words:\n",
    "* The position tells you impact.\n",
    "* The color tells you feature value.\n",
    "* The density (thickness) of dots shows how often a value occurs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "345467a8",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}