A lot of stuff.

This commit is contained in:
pablo 2022-05-31 08:04:58 +02:00
parent c3e6424148
commit 113150e96a
21 changed files with 22045 additions and 0 deletions

File diff suppressed because one or more lines are too long

BIN
cases/case_1/case_1.zip Normal file

Binary file not shown.

BIN
cases/case_1/grading.xlsx Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -0,0 +1,601 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "case_2_student_notebook.ipynb",
"provenance": [],
"toc_visible": true,
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Introduction to PuLP\n",
"\n",
"For case 2, you will need to define and solve optimization problems. In this notebook, I'll help you understand how to use `pulp`, a Python package for modeling optimization problems. You might want to check the following links:\n",
"\n",
"- Documentation: https://coin-or.github.io/pulp/\n",
"- Homepage: https://github.com/coin-or/pulp\n",
"\n"
],
"metadata": {
"id": "eLvjUuJdzS7z"
}
},
{
"cell_type": "markdown",
"source": [
"# Installing and checking all is in place"
],
"metadata": {
"id": "HFavOEVS0dbY"
}
},
{
"cell_type": "markdown",
"source": [
"The first thing you need to do is to install `pulp`. `pulp` is not in the standard available packages in Colab, so you need to run the following cell once. "
],
"metadata": {
"id": "HgZwpjUG0PsK"
}
},
{
"cell_type": "code",
"source": [
"!pip install pulp"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ni6Q_YiO0nIm",
"outputId": "405d3f57-4502-4ed5-dcb9-d3204b585bb8"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Collecting pulp\n",
" Downloading PuLP-2.6.0-py3-none-any.whl (14.2 MB)\n",
"\u001b[K |████████████████████████████████| 14.2 MB 8.9 MB/s \n",
"\u001b[?25hInstalling collected packages: pulp\n",
"Successfully installed pulp-2.6.0\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"After doing that, you can import the library."
],
"metadata": {
"id": "k9YI0Kzw0qLT"
}
},
{
"cell_type": "code",
"source": [
"import pulp"
],
"metadata": {
"id": "hw6keX7x0tZ1"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"If all is good, running the following command will print a large log testing `pulp`. The last line should read \"OK\"."
],
"metadata": {
"id": "vD_rXehL1KXX"
}
},
{
"cell_type": "code",
"source": [
"pulp.pulpTestAll()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Ney2a8mu1JqQ",
"outputId": "a6b32d96-b163-4fc5-b4fc-3d5673e5a40a"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.........."
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\t Test that logic put in place for deprecation handling of indexs works\n",
"\t Testing 'indexs' param continues to work for LpVariable.dicts\n",
"\t Testing 'indexs' param continues to work for LpVariable.matrix\n",
"\t Testing 'indices' argument works in LpVariable.dicts\n",
"\t Testing 'indices' param continues to work for LpVariable.matrix\n",
"\t Testing invalid status\n",
"\t Testing continuous LP solution - export dict\n",
"\t Testing export dict for LP\n",
"\t Testing export dict MIP\n",
"\t Testing maximize continuous LP solution\n",
"\t Testing continuous LP solution - export JSON\n",
"\t Testing continuous LP solution - export solver dict\n",
"\t Testing continuous LP solution - export solver JSON\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
".........."
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\t Testing reading MPS files - binary variable, no constraint names\n",
"\t Testing reading MPS files - integer variable\n",
"\t Testing reading MPS files - maximize\n",
"\t Testing invalid var names\n",
"\t Testing logPath argument\n",
"\t Testing makeDict general behavior\n",
"\t Testing makeDict default value behavior\n",
"\t Testing measuring optimization time\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"............."
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\t Testing the availability of the function pulpTestAll\n",
"\t Testing zero subtraction\n",
"\t Testing inconsistent lp solution\n",
"\t Testing continuous LP solution\n",
"\t Testing maximize continuous LP solution\n",
"\t Testing unbounded continuous LP solution\n",
"\t Testing Long Names\n",
"\t Testing repeated Names\n",
"\t Testing zero constraint\n",
"\t Testing zero objective\n",
"\t Testing LpVariable (not LpAffineExpression) objective\n",
"\t Testing Long lines in LP\n",
"\t Testing LpAffineExpression divide\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"............."
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\t Testing MIP solution\n",
"\t Testing MIP solution with floats in objective\n",
"\t Testing Initial value in MIP solution\n",
"\t Testing fixing value in MIP solution\n",
"\t Testing MIP relaxation\n",
"\t Testing feasibility problem (no objective)\n",
"\t Testing an infeasible problem\n",
"\t Testing an integer infeasible problem\n",
"\t Testing another integer infeasible problem\n",
"\t Testing column based modelling\n",
"\t Testing dual variables and slacks reporting\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"..........ssssssssssssssssssssssssssssssssssssssssssssssssssssss"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\t Testing fractional constraints\n",
"\t Testing elastic constraints (no change)\n",
"\t Testing elastic constraints (freebound)\n",
"\t Testing elastic constraints (penalty unchanged)\n",
"\t Testing elastic constraints (penalty unbounded)\n",
"\t Testing timeLimit argument\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss\n",
"----------------------------------------------------------------------\n",
"Ran 840 tests in 15.681s\n",
"\n",
"OK (skipped=784)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# Defining and solving problems\n",
"\n",
"The following cells show you the absolute minimum to model and solve a problem with `pulp`. The steps are:\n",
"\n",
"1. Define decision variables\n",
"2. Define the target function\n",
"3. Define the constraints\n",
"4. Assemble the problem\n",
"5. Solve it\n",
"6. Examine results\n",
"\n",
"For more flexibility, options and interesting stuff, please check up the PuLP documentation."
],
"metadata": {
"id": "oiXz40NR1whf"
}
},
{
"cell_type": "markdown",
"source": [
"## Define decision variables"
],
"metadata": {
"id": "nq5bcQs03g0j"
}
},
{
"cell_type": "code",
"source": [
"x = pulp.LpVariable(\n",
" name=\"x\",\n",
" cat=pulp.LpContinuous \n",
" )\n",
"\n",
"y = pulp.LpVariable(\n",
" name=\"y\",\n",
" cat=pulp.LpInteger # This will make the variable integer only\n",
" )\n",
"\n",
"z = pulp.LpVariable(\n",
" name=\"z\",\n",
" cat=pulp.LpBinary # This will make the variable binary (only 0 or 1)\n",
")"
],
"metadata": {
"id": "0SPhww4L3buh"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Define the target function"
],
"metadata": {
"id": "uhlbq2oO35kp"
}
},
{
"cell_type": "code",
"source": [
"target_function = 10 * x - 5 * y + z"
],
"metadata": {
"id": "pu3Im9DH39CN"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Define constraints"
],
"metadata": {
"id": "lqD0dD474Izw"
}
},
{
"cell_type": "code",
"source": [
"constraint_1 = x >= 0\n",
"constraint_2 = y >= 0\n",
"constraint_3 = x >= 10\n",
"constraint_4 = y <= 50"
],
"metadata": {
"id": "5Cu51lYj4OUC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Assemble the problem\n",
"\n",
"To put all the parts together, you need to declare a problem and specify if you want to minimize or maximize the target function.\n",
"\n",
"Once you have that:\n",
"- First, you \"add\" the target function.\n",
"- After, you \"add\" all the constraints you want to include."
],
"metadata": {
"id": "d5nq94IM4kSU"
}
},
{
"cell_type": "code",
"source": [
"problem = pulp.LpProblem(\"my_silly_problem\", pulp.LpMinimize)\n",
"\n",
"problem += target_function\n",
"\n",
"for constraint in (\n",
" constraint_1,\n",
" constraint_2,\n",
" constraint_3,\n",
" constraint_4\n",
" ):\n",
" problem += constraint"
],
"metadata": {
"id": "yI-Oiwh64mRc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Solve it\n",
"\n",
"The problem object is now unsolved. You can call the `solve` method on it to find a solution."
],
"metadata": {
"id": "RJTWfR8-5fBd"
}
},
{
"cell_type": "code",
"source": [
"f\"Status: {pulp.LpStatus[problem.status]}\"\n",
"problem.solve()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4Fbltpbp5mRi",
"outputId": "07f9c959-e9b0-4fe7-e7ea-c698703111ff"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1"
]
},
"metadata": {},
"execution_count": 9
}
]
},
{
"cell_type": "markdown",
"source": [
"## Examine results\n",
"\n",
"After calling `solve` on a problem, you can access:\n",
"- The status of the problem. It can be solved, but also it might show to be not feasible.\n",
"- The values assigned to each decision variable.\n",
"- The final value for the target function.\n",
"\n"
],
"metadata": {
"id": "0pc9RmrO7FKo"
}
},
{
"cell_type": "code",
"source": [
"print(f\"Status: {pulp.LpStatus[problem.status]}\")\n",
"for v in problem.variables():\n",
" print(v.name, \"=\", v.varValue)\n",
" \n",
"print(pulp.value(problem.objective))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8U4xVvUg9W07",
"outputId": "32a330f1-65ab-4903-f29b-368f2bacaf94"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Status: Optimal\n",
"x = 10.0\n",
"y = 50.0\n",
"z = 0.0\n",
"-150.0\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# Peanut Butter Example\n",
"\n",
"As an additional example, you can find below the model and solver for the Peanut Butter Sandwich example we discussed on lecture 6."
],
"metadata": {
"id": "I2lNaFm2XVK1"
}
},
{
"cell_type": "code",
"source": [
"pb = pulp.LpVariable(\n",
" name=\"Peanut Butter grams\",\n",
" cat=pulp.LpContinuous \n",
" )\n",
"\n",
"b = pulp.LpVariable(\n",
" name=\"Bread grams\",\n",
" cat=pulp.LpContinuous \n",
" )"
],
"metadata": {
"id": "HI4E2dNoXVK4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"target_function = 5.88 * pb + 2.87 * b"
],
"metadata": {
"id": "PfTxq8R0XVLB"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"no_negative_pb = pb >= 0\n",
"no_negative_b = b >= 0\n",
"max_pb_we_have = pb <= 200\n",
"max_b_we_have = b <= 300\n",
"doctors_dietary_restriction = pb <= 0.13 * b"
],
"metadata": {
"id": "2X1AzQM8XVLD"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"problem = pulp.LpProblem(\"sandwich_problem\", pulp.LpMaximize)\n",
"\n",
"problem += target_function\n",
"\n",
"for constraint in (\n",
" no_negative_pb,\n",
" no_negative_b,\n",
" max_pb_we_have,\n",
" max_b_we_have,\n",
" doctors_dietary_restriction\n",
" ):\n",
" problem += constraint"
],
"metadata": {
"id": "3oEoQXebXVLE"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"f\"Status: {pulp.LpStatus[problem.status]}\"\n",
"problem.solve()\n",
"print(f\"Status: {pulp.LpStatus[problem.status]}\")\n",
"for v in problem.variables():\n",
" print(v.name, \"=\", v.varValue)\n",
" \n",
"print(f\"Final calories: {pulp.value(problem.objective)}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "d873d58d-6d9e-459e-d66f-127f436b1aab",
"id": "u1vI73kiXVLF"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Status: Optimal\n",
"Bread_grams = 300.0\n",
"Peanut_Butter_grams = 39.0\n",
"1090.32\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# Case 2\n",
"\n",
"You can use the rest of the notebook to work on the different parts of case 2."
],
"metadata": {
"id": "6kWgbTjU-LaN"
}
},
{
"cell_type": "code",
"source": [
"# Good luck!"
],
"metadata": {
"id": "aYzseTWh-Sal"
},
"execution_count": null,
"outputs": []
}
]
}

BIN
cases/case_2/grading.xlsx Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -0,0 +1,186 @@
# Case 3: Improving last mile logistics with Machine Learning
After your last engagement with Charles, you have pretty much become the go-to
service provider for advanced quantitative methods. Congratulations!
You have been called again to help a different manager within Beanie Limited:
Estefania Pelaez. Estefania is the city manager for Barcelona. She is in charge
of all commercial and logistic operations that happen in the city.
One of the operations that Beanie Limited runs in Barcelona is their own
last-mile coffee delivery service. The company runs a small fleet of vans and
trucks that delivers small quantities of roasted coffee beans (typically,
around 10-100kg of coffee per delivery) to restaurants, cafes, hotels and other
businesses in the city.
The efficiency of the deliveries is important to keep margins profitable for
Beanie Limited. A sloppy management can make the company lose money. Hence,
Estefania is always working on ways to make the operations as smooth as
possible.
Currently, Beanie Limited has rented space in two warehouses: one located in
Zona Franca and another one in Baro de Viver. Complementing that, the company
has a small fleet of combi vans, regular sized vans and one truck, which are
used by Beanie Limited own drivers to deliver the coffee beans from the
warehouses to the customer's facilities.
Orders placed by the customers are predictable and placed with time in advance,
which allows Estefania and her team to plan the deliveries to minimize wasted
effort by the fleet. Since they know which locations they will need to deliver
to, they use a routing software that drafts the routes that each vehicle will
cover each day.
Recently, Estefania recently realized something: deliveries are almost always
taking place too early or too late. After researching with some data, Estefania
found out that there was nothing wrong with the routing software time
estimates: the driving time between locations predicted by the software is
accurate. The real issue is related to what Estefania's team calls the "
engine-off" time.
The engine-off time is the time a driver spends actually dropping off goods in
a client location. It's called engine-off because the clock starts ticking when
the driver takes the keys off the van and stops when the driver starts driving
again.
Currently, Estefania and her team assume an engine-off time of 3 minutes for
all deliveries when building the delivery routes and schedules. But it seems
that this not realistic at all and is causing a lot of trouble with the
schedules. Clients are not happy with delivery times not being respected, some
driver routes end up too early (which means that the same driver could have
covered more clients) and some others run for too long (which means they have
to go back to the warehouse without delivering all the goods requested by the
clients).
If Estefania could know beforehand what would be the engine-off time of
different deliveries, she could improve the route planning to fix all of these
issues. She has been told that Machine Learning could help with this and is
expecting you to find out if and how it can be applied to this problem.
## Detailed Task Definition
- Below you will find four levels of questions. Levels 1 to 3 are compulsory.
Level 4 is optional.
- You need to write a report document where you answer the questions of the
different levels. This report should be directed towards Estefania, should
give him clear recommendations and should justify these recommendations. It's
important for you to reflect your methodology to back your proposals.
- Each level is worth 2 points out of a total of 10. The 2 missing points will
grade the clarity and structure of your report and code.
- You need to use a Python notebook to solve all levels. Please attach a
notebook that shows your solution/proposal/analysis. Your notebook should be
runnable "as-is". That means that anyone should be able to run it from
beginning to end without any additional instructions or action required (
except for uploading data from a CSV in the Google Colab environment. That
requires someone to upload the file with a few clicksand it's fine).
- Include your team number, names and student IDs in all your deliverables.
## Data
By joining the customer database together with past deliveries details,
Estefania has built a dataset of execute deliveries. The table contains 9,000
examples of past deliveries and their engine-off times. The exact field
meanings are explained below:
- client_name: the name of the client.
- truck_size: what type of truck was being used. Can be one of Combi, Van or
Truck.
- truck_origin_warehouse: from which Beanie Limited warehouse did the route
start.
- delivery_timestamp: at what date and time was the delivery done (defined as
the moment the engine-off time starts).
- total_weight: total weight of the goods delivery.
- brand_1_coffee_proportion: what percentage of the delivery was of Beanie's
brand #1.
- brand_2_coffee_proportion: what percentage of the delivery was of Beanie's
brand #2.
- brand_3_coffee_proportion: what percentage of the delivery was of Beanie's
brand #3.
- driver_id: the ID of the driver that was driving the route.
- is_fresh_client: whether the client was fresh at the date of the delivery.
Fresh clients are clients that have been doing business with Beanie for less
than 30 days.
- postcode: the postcode of the client location.
- business_category: whether the client is a hotel, a cafe or restaurant or a
coffee retailer.
- floor: the physical position of the client location.
- partnership_level: indicates the partnership level with Beanie. Key Account
are important clients for Beanie Limited. Diamond clients are the top
priority clients for the company.
- box_count: how many distinct boxes were delivered to the client. The coffee
beans bags are grouped into boxes for delivery.
- final_time: the engine-off time, measured in seconds.
## Notebook
Case 3 comes with no helping notebook: this time, you will have to code things
from scratch yourselves. Remember that you are still suposed to write and
deliver a notebook (see the "Detailed Task Definition" section).
A few comments on your notebook:
- I'm a going to constraint you to
use [scikit-learn](https://scikit-learn.org/stable/) as a ML library. You can
of course use other useful Python libraries such as pandas, numpy, etc. But
for ML modeling, please go with scikit-learn.
- Below you can find some useful materials which relate to what you need to do
as part of the case:
- [A simple, guided EDA on the Titanic Dataset](https://www.datacamp.com/tutorial/kaggle-machine-learning-eda)
- [A guide on regression performance metrics](https://machinelearningmastery.com/regression-metrics-for-machine-learning/)
and
some [material from scikit-learn on the same topic](https://scikit-learn.org/stable/modules/classes.html#regression-metrics)
-
An [introduction to cross-validation](https://machinelearningmastery.com/k-fold-cross-validation/)
- A
thorough [review on why we need to use baselines](https://blog.ml.cmu.edu/2020/08/31/3-baselines/)
in ML
- A
simple [introduction to linear regression with scikit-learn](https://stackabuse.com/linear-regression-in-python-with-scikit-learn/)
.
## Levels
### Level 1
- Assess for Estefania if ML is a good choice for her problem and explain why.
- Perform Exploratory Data Analysis on the given data. Is it clean? Which
variables could be useful to explain the engine-off time? Are there any other
interesting things you can draw from the dataset?
### Level 2
- Present how are you going to measure performance for this problem and how you
will use the available data for testing it.
- Develop a baseline algorithm and evaluate its performance.
### Level 3
- Develop the best model you can make to predict engine-off time.
- Explain your methodology and report on performance.
- Compare your performance to the baseline algorithm. Reflect on what is the
cause of whatever differences can be observed between both.
### Level 4
After presenting your model and results, Estefania has two different questions:
- Estefania would like to learn from the ML algorithm. What are the most
relevant features that define the engine-off time? Can you somehow quantify
how important each is or which are most useful?
- Estefania is interested in learning about next steps. What can be done to
improve even more the model performance and achieve better results?
### SPECIAL
For this case, we are going to run a little competition. There will be a
surprise gift on the last lecture for the team that wins.
The competition consists on getting the best performant model of the course. I
have a hidden part of Estefania's dataset. If you present a notebook with a
working model before the date XXXX (note this is earlier than the case delivery
time), I will use your model to predict engine-off times on the hidden data.
The team with the lowest error will win.
To enter the competition, write a function at the end of your notebook
called `predict_to_compete`. The function should take as its only input a
dataframe with the same format as the shared dataset. The function should
return a numpy array with the predicted drop-off times.

9001
cases/case_3/dropoffs_df.csv Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

BIN
contrato_firmado.pdf Normal file

Binary file not shown.