{
"cells": [
{
"cell_type": "markdown",
"id": "9c8fe017",
"metadata": {},
"source": [
"# Central Motivation\n",
"\n",
"How different are fast food places from each other? Many people swear by their favorite restaurant, but can this by justified by emprirical factors, such as nutrition facts? Is there a big difference between the fast food chains that claim specializations in certain food products, such as burger places vs. chicken places vs. \"healthy\" options? Our program is exploratory by nature, seeking find the differentiating nutritional factors between varying fast food chains. The applications range from helping businesses gain insight on how their menus vary from their competitors to recommending similar food items."
]
},
{
"cell_type": "markdown",
"id": "7905a458",
"metadata": {},
"source": [
"# Summary of Data Processing Pipeline "
]
},
{
"cell_type": "markdown",
"id": "40e68a9d",
"metadata": {},
"source": [
"Data Sources from multiple sources:\n",
"\n",
"## Get Data Sources\n",
"1. From PDF converted to Xcel (BGOOD) (converted using ADOBE) --> merged\n",
"\n",
"\n",
"2. Webscraping \n",
"\n",
"\n",
"3. CSV file of fast food nutrition facts downloaded from [OpenIntro](https://www.openintro.org/data/index.php?data=fastfood)\n",
"\n",
"\n",
"4. From PDF to CSV (In and Out Burger)\n",
"\n",
"\n",
"## Merge Data Sources\n",
"1. Merge all our data sources into one dataframe\n",
"\n",
"\n",
"## Clean Data\n",
"1. In the pandas DF, we need to further clean and account for missing values"
]
},
{
"cell_type": "markdown",
"id": "cda11570",
"metadata": {},
"source": [
"# Obtain, Clean, and Merge Data Sources "
]
},
{
"cell_type": "code",
"execution_count": 98,
"id": "e766376b",
"metadata": {},
"outputs": [],
"source": [
"from bs4 import BeautifulSoup\n",
"import requests\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import plotly\n",
"import requests"
]
},
{
"cell_type": "code",
"execution_count": 99,
"id": "ccf3797c",
"metadata": {},
"outputs": [],
"source": [
"def get_url(url):\n",
" \"\"\" gets the html string of the url\n",
"\n",
" Args:\n",
" url (str): website url\n",
"\n",
" Returns:\n",
" html_str (str): html of website\n",
" \"\"\"\n",
" html_str = requests.get(url).text\n",
" \n",
" return html_str"
]
},
{
"cell_type": "code",
"execution_count": 100,
"id": "2895fbc0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" restaurant \n",
" item \n",
" calories \n",
" cal_fat \n",
" total_fat \n",
" sat_fat \n",
" trans_fat \n",
" cholesterol \n",
" sodium \n",
" total_carb \n",
" fiber \n",
" sugar \n",
" protein \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" Mcdonalds \n",
" Artisan Grilled Chicken Sandwich \n",
" 380 \n",
" 60 \n",
" 7 \n",
" 2.0 \n",
" 0.0 \n",
" 95 \n",
" 1110 \n",
" 44 \n",
" 3.0 \n",
" 11 \n",
" 37.0 \n",
" \n",
" \n",
" 1 \n",
" Mcdonalds \n",
" Single Bacon Smokehouse Burger \n",
" 840 \n",
" 410 \n",
" 45 \n",
" 17.0 \n",
" 1.5 \n",
" 130 \n",
" 1580 \n",
" 62 \n",
" 2.0 \n",
" 18 \n",
" 46.0 \n",
" \n",
" \n",
" 2 \n",
" Mcdonalds \n",
" Double Bacon Smokehouse Burger \n",
" 1130 \n",
" 600 \n",
" 67 \n",
" 27.0 \n",
" 3.0 \n",
" 220 \n",
" 1920 \n",
" 63 \n",
" 3.0 \n",
" 18 \n",
" 70.0 \n",
" \n",
" \n",
" 3 \n",
" Mcdonalds \n",
" Grilled Bacon Smokehouse Chicken Sandwich \n",
" 750 \n",
" 280 \n",
" 31 \n",
" 10.0 \n",
" 0.5 \n",
" 155 \n",
" 1940 \n",
" 62 \n",
" 2.0 \n",
" 18 \n",
" 55.0 \n",
" \n",
" \n",
" 4 \n",
" Mcdonalds \n",
" Crispy Bacon Smokehouse Chicken Sandwich \n",
" 920 \n",
" 410 \n",
" 45 \n",
" 12.0 \n",
" 0.5 \n",
" 120 \n",
" 1980 \n",
" 81 \n",
" 4.0 \n",
" 18 \n",
" 46.0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" restaurant item calories cal_fat \\\n",
"0 Mcdonalds Artisan Grilled Chicken Sandwich 380 60 \n",
"1 Mcdonalds Single Bacon Smokehouse Burger 840 410 \n",
"2 Mcdonalds Double Bacon Smokehouse Burger 1130 600 \n",
"3 Mcdonalds Grilled Bacon Smokehouse Chicken Sandwich 750 280 \n",
"4 Mcdonalds Crispy Bacon Smokehouse Chicken Sandwich 920 410 \n",
"\n",
" total_fat sat_fat trans_fat cholesterol sodium total_carb fiber \\\n",
"0 7 2.0 0.0 95 1110 44 3.0 \n",
"1 45 17.0 1.5 130 1580 62 2.0 \n",
"2 67 27.0 3.0 220 1920 63 3.0 \n",
"3 31 10.0 0.5 155 1940 62 2.0 \n",
"4 45 12.0 0.5 120 1980 81 4.0 \n",
"\n",
" sugar protein \n",
"0 11 37.0 \n",
"1 18 46.0 \n",
"2 18 70.0 \n",
"3 18 55.0 \n",
"4 18 46.0 "
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Open Intro Menu Data\n",
"df_ff = pd.read_csv('fastfood.csv')\n",
"df_ff.drop(columns=['salad', 'calcium', 'vit_a', 'vit_c'], inplace=True)\n",
"df_ff.head()"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "39942153",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" item \n",
" calories \n",
" total_fat \n",
" sat_fat \n",
" trans_fat \n",
" cholesterol \n",
" sodium \n",
" total_carb \n",
" fiber \n",
" sugar \n",
" protein \n",
" restaurant \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" The Classic \n",
" 590 \n",
" 40.2 \n",
" 9.1 \n",
" 1 \n",
" 87 \n",
" 977 \n",
" 33 \n",
" 3 \n",
" 4 \n",
" 23 \n",
" BGood \n",
" \n",
" \n",
" 1 \n",
" West Side \n",
" 550 \n",
" 44.6 \n",
" 9.9 \n",
" 1 \n",
" 58 \n",
" 1294 \n",
" 45 \n",
" 9 \n",
" 9 \n",
" 25 \n",
" BGood \n",
" \n",
" \n",
" 2 \n",
" The Farmhouse \n",
" 720 \n",
" 56.1 \n",
" 16.9 \n",
" 1 \n",
" 130 \n",
" 1174 \n",
" 34 \n",
" 2 \n",
" 4 \n",
" 37 \n",
" BGood \n",
" \n",
" \n",
" 3 \n",
" The Freebird \n",
" 785 \n",
" 52.8 \n",
" 15.7 \n",
" 0 \n",
" 166 \n",
" 991 \n",
" 36 \n",
" 3 \n",
" 5 \n",
" 43 \n",
" BGood \n",
" \n",
" \n",
" 4 \n",
" Power Play \n",
" 640 \n",
" 47.1 \n",
" 16.5 \n",
" 1 \n",
" 350 \n",
" 1281 \n",
" 38 \n",
" 7 \n",
" 4 \n",
" 41 \n",
" BGood \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" item calories total_fat sat_fat trans_fat cholesterol sodium \\\n",
"0 The Classic 590 40.2 9.1 1 87 977 \n",
"1 West Side 550 44.6 9.9 1 58 1294 \n",
"2 The Farmhouse 720 56.1 16.9 1 130 1174 \n",
"3 The Freebird 785 52.8 15.7 0 166 991 \n",
"4 Power Play 640 47.1 16.5 1 350 1281 \n",
"\n",
" total_carb fiber sugar protein restaurant \n",
"0 33 3 4 23 BGood \n",
"1 45 9 9 25 BGood \n",
"2 34 2 4 37 BGood \n",
"3 36 3 5 43 BGood \n",
"4 38 7 4 41 BGood "
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# B GOOD Data\n",
"df_bgood = pd.read_csv('bgood_menu_only.csv')\n",
"df_bgood['restaurant'] = 'BGood'\n",
"\n",
"\n",
"df_bgood.rename(columns = {'Craft Burgers':'item', \n",
" 'Calories':'calories',\n",
" 'Total Fat (g)':'total_fat',\n",
" 'Saturated Fat (g)':'sat_fat',\n",
" 'Trans Fat (g)':'trans_fat',\n",
" 'Cholesterol (mg)':'cholesterol',\n",
" 'Sodium (mg)': 'sodium',\n",
" 'Total Carbohydrate (g)':'total_carb',\n",
" 'Fiber (g)': 'fiber',\n",
" 'Sugars (g)': 'sugar',\n",
" 'Protein (g)': 'protein'}, inplace = True)\n",
"\n",
"df_bgood.drop(['Unnamed: 3', 'Unnamed: 5', 'Unnamed: 7', \n",
" 'Unnamed: 9', 'Unnamed: 11','Unnamed: 13', \n",
" 'Unnamed: 14','Unnamed: 16'], axis = 1, inplace = True)\n",
"\n",
"df_bgood.head()"
]
},
{
"cell_type": "code",
"execution_count": 102,
"id": "0175117c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" item \n",
" calories \n",
" total_fat \n",
" sat_fat \n",
" trans_fat \n",
" cholesterol \n",
" sodium \n",
" total_carb \n",
" fiber \n",
" sugar \n",
" protein \n",
" restaurant \n",
" cal_fat \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" The Classic \n",
" 590 \n",
" 40.2 \n",
" 9.1 \n",
" 1 \n",
" 87 \n",
" 977 \n",
" 33 \n",
" 3 \n",
" 4 \n",
" 23 \n",
" BGood \n",
" NaN \n",
" \n",
" \n",
" 1 \n",
" West Side \n",
" 550 \n",
" 44.6 \n",
" 9.9 \n",
" 1 \n",
" 58 \n",
" 1294 \n",
" 45 \n",
" 9 \n",
" 9 \n",
" 25 \n",
" BGood \n",
" NaN \n",
" \n",
" \n",
" 2 \n",
" The Farmhouse \n",
" 720 \n",
" 56.1 \n",
" 16.9 \n",
" 1 \n",
" 130 \n",
" 1174 \n",
" 34 \n",
" 2 \n",
" 4 \n",
" 37 \n",
" BGood \n",
" NaN \n",
" \n",
" \n",
" 3 \n",
" The Freebird \n",
" 785 \n",
" 52.8 \n",
" 15.7 \n",
" 0 \n",
" 166 \n",
" 991 \n",
" 36 \n",
" 3 \n",
" 5 \n",
" 43 \n",
" BGood \n",
" NaN \n",
" \n",
" \n",
" 4 \n",
" Power Play \n",
" 640 \n",
" 47.1 \n",
" 16.5 \n",
" 1 \n",
" 350 \n",
" 1281 \n",
" 38 \n",
" 7 \n",
" 4 \n",
" 41 \n",
" BGood \n",
" NaN \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 510 \n",
" Spicy Triple Double Crunchwrap \n",
" 780 \n",
" 38 \n",
" 10 \n",
" 0.5 \n",
" 50 \n",
" 1850 \n",
" 87 \n",
" 9 \n",
" 8 \n",
" 23 \n",
" Taco Bell \n",
" 340.0 \n",
" \n",
" \n",
" 511 \n",
" Express Taco Salad w/ Chips \n",
" 580 \n",
" 29 \n",
" 9 \n",
" 1 \n",
" 60 \n",
" 1270 \n",
" 59 \n",
" 8 \n",
" 7 \n",
" 23 \n",
" Taco Bell \n",
" 260.0 \n",
" \n",
" \n",
" 512 \n",
" Fiesta Taco Salad-Beef \n",
" 780 \n",
" 42 \n",
" 10 \n",
" 1 \n",
" 60 \n",
" 1340 \n",
" 74 \n",
" 11 \n",
" 7 \n",
" 26 \n",
" Taco Bell \n",
" 380.0 \n",
" \n",
" \n",
" 513 \n",
" Fiesta Taco Salad-Chicken \n",
" 720 \n",
" 35 \n",
" 7 \n",
" 0 \n",
" 70 \n",
" 1260 \n",
" 70 \n",
" 8 \n",
" 8 \n",
" 32 \n",
" Taco Bell \n",
" 320.0 \n",
" \n",
" \n",
" 514 \n",
" Fiesta Taco Salad-Steak \n",
" 720 \n",
" 36 \n",
" 8 \n",
" 1 \n",
" 55 \n",
" 1340 \n",
" 70 \n",
" 8 \n",
" 8 \n",
" 28 \n",
" Taco Bell \n",
" 320.0 \n",
" \n",
" \n",
"
\n",
"
571 rows × 13 columns
\n",
"
"
],
"text/plain": [
" item calories total_fat sat_fat trans_fat \\\n",
"0 The Classic 590 40.2 9.1 1 \n",
"1 West Side 550 44.6 9.9 1 \n",
"2 The Farmhouse 720 56.1 16.9 1 \n",
"3 The Freebird 785 52.8 15.7 0 \n",
"4 Power Play 640 47.1 16.5 1 \n",
".. ... ... ... ... ... \n",
"510 Spicy Triple Double Crunchwrap 780 38 10 0.5 \n",
"511 Express Taco Salad w/ Chips 580 29 9 1 \n",
"512 Fiesta Taco Salad-Beef 780 42 10 1 \n",
"513 Fiesta Taco Salad-Chicken 720 35 7 0 \n",
"514 Fiesta Taco Salad-Steak 720 36 8 1 \n",
"\n",
" cholesterol sodium total_carb fiber sugar protein restaurant cal_fat \n",
"0 87 977 33 3 4 23 BGood NaN \n",
"1 58 1294 45 9 9 25 BGood NaN \n",
"2 130 1174 34 2 4 37 BGood NaN \n",
"3 166 991 36 3 5 43 BGood NaN \n",
"4 350 1281 38 7 4 41 BGood NaN \n",
".. ... ... ... ... ... ... ... ... \n",
"510 50 1850 87 9 8 23 Taco Bell 340.0 \n",
"511 60 1270 59 8 7 23 Taco Bell 260.0 \n",
"512 60 1340 74 11 7 26 Taco Bell 380.0 \n",
"513 70 1260 70 8 8 32 Taco Bell 320.0 \n",
"514 55 1340 70 8 8 28 Taco Bell 320.0 \n",
"\n",
"[571 rows x 13 columns]"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_bgood_ff = pd.concat([df_bgood, df_ff])\n",
"df_bgood_ff"
]
},
{
"cell_type": "code",
"execution_count": 103,
"id": "cc95306c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" item \n",
" calories \n",
" total_fat \n",
" saturated_fat \n",
" trans_fat \n",
" cholesterol \n",
" sodium \n",
" total_carb \n",
" fiber \n",
" sugar \n",
" protein \n",
" restaurant \n",
" \n",
" \n",
" \n",
" \n",
" 4 \n",
" Hamburger Patty \n",
" 220 \n",
" 17 \n",
" 8 \n",
" 1 \n",
" 60 \n",
" 50 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 16 \n",
" Five Guys \n",
" \n",
" \n",
" 5 \n",
" Hot Dog \n",
" 240 \n",
" 20 \n",
" 9 \n",
" 1 \n",
" 45 \n",
" 1040 \n",
" 2 \n",
" 0 \n",
" 1 \n",
" 11 \n",
" Five Guys \n",
" \n",
" \n",
" 7 \n",
" Five Guys Bun \n",
" 260 \n",
" 9 \n",
" 3.5 \n",
" 0 \n",
" 5 \n",
" 330 \n",
" 39 \n",
" 2 \n",
" 8 \n",
" 7 \n",
" Five Guys \n",
" \n",
" \n",
" 9 \n",
" Little \n",
" 526 \n",
" 23 \n",
" 4 \n",
" 0.5 \n",
" 0 \n",
" 531 \n",
" 72 \n",
" 8 \n",
" 2 \n",
" 8 \n",
" Five Guys \n",
" \n",
" \n",
" 10 \n",
" Regular \n",
" 953 \n",
" 41 \n",
" 7 \n",
" 1 \n",
" 0 \n",
" 962 \n",
" 131 \n",
" 15 \n",
" 4 \n",
" 15 \n",
" Five Guys \n",
" \n",
" \n",
" 11 \n",
" Large \n",
" 1314 \n",
" 57 \n",
" 10 \n",
" 1 \n",
" 0 \n",
" 1327 \n",
" 181 \n",
" 21 \n",
" 6 \n",
" 20 \n",
" Five Guys \n",
" \n",
" \n",
" 13 \n",
" A.1® Sauce \n",
" 15 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 280 \n",
" 3 \n",
" 0 \n",
" 2 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
" 15 \n",
" Cheese*# (1 slice) \n",
" 70 \n",
" 6 \n",
" 4 \n",
" 0.2 \n",
" 15 \n",
" 360 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 4 \n",
" Five Guys \n",
" \n",
" \n",
" 17 \n",
" Grilled Mushrooms \n",
" 5 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 55 \n",
" 1 \n",
" 0 \n",
" 1 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
" 18 \n",
" Hot Sauce \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 200 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
" 20 \n",
" Ketchup \n",
" 20 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 160 \n",
" 5 \n",
" 0 \n",
" 4 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
" 22 \n",
" Mayonnaise \n",
" 100 \n",
" 11 \n",
" 2 \n",
" 0 \n",
" 10 \n",
" 75 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
" 23 \n",
" Mustard \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 55 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
" 25 \n",
" Pickles \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 258 \n",
" 1 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
" 26 \n",
" Relish \n",
" 10 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 105 \n",
" 3 \n",
" 0 \n",
" 3 \n",
" 0 \n",
" Five Guys \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" item calories total_fat saturated_fat trans_fat cholesterol \\\n",
"4 Hamburger Patty 220 17 8 1 60 \n",
"5 Hot Dog 240 20 9 1 45 \n",
"7 Five Guys Bun 260 9 3.5 0 5 \n",
"9 Little 526 23 4 0.5 0 \n",
"10 Regular 953 41 7 1 0 \n",
"11 Large 1314 57 10 1 0 \n",
"13 A.1® Sauce 15 0 0 0 0 \n",
"15 Cheese*# (1 slice) 70 6 4 0.2 15 \n",
"17 Grilled Mushrooms 5 0 0 0 0 \n",
"18 Hot Sauce 0 0 0 0 0 \n",
"20 Ketchup 20 0 0 0 0 \n",
"22 Mayonnaise 100 11 2 0 10 \n",
"23 Mustard 0 0 0 0 0 \n",
"25 Pickles 3 0 0 0 0 \n",
"26 Relish 10 0 0 0 0 \n",
"\n",
" sodium total_carb fiber sugar protein restaurant \n",
"4 50 0 0 0 16 Five Guys \n",
"5 1040 2 0 1 11 Five Guys \n",
"7 330 39 2 8 7 Five Guys \n",
"9 531 72 8 2 8 Five Guys \n",
"10 962 131 15 4 15 Five Guys \n",
"11 1327 181 21 6 20 Five Guys \n",
"13 280 3 0 2 0 Five Guys \n",
"15 360 0 0 0 4 Five Guys \n",
"17 55 1 0 1 0 Five Guys \n",
"18 200 0 0 0 0 Five Guys \n",
"20 160 5 0 4 0 Five Guys \n",
"22 75 0 0 0 0 Five Guys \n",
"23 55 0 0 0 0 Five Guys \n",
"25 258 1 0 0 0 Five Guys \n",
"26 105 3 0 3 0 Five Guys "
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# B GOOD Data\n",
"five_guys_df = pd.read_csv('five_guys.csv', \n",
" names=['item','serving_size','calories', 'total_fat', \n",
" 'saturated_fat', 'trans_fat', 'cholesterol', 'sodium',\n",
" 'total_carb', 'fiber', 'sugar', 'protein'])\n",
"five_guys_df.dropna(inplace=True)\n",
"five_guys_df = five_guys_df[1:] \n",
"five_guys_df['restaurant'] = 'Five Guys'\n",
"five_guys_df.drop(columns='serving_size', inplace=True)\n",
"five_guys_df\n"
]
},
{
"cell_type": "code",
"execution_count": 104,
"id": "50312a17",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" item \n",
" calories \n",
" total_fat \n",
" saturated_fat \n",
" trans_fat \n",
" cholesterol \n",
" sodium \n",
" total_carb \n",
" fiber \n",
" sugar \n",
" protein \n",
" restaurant \n",
" sat_fat \n",
" cal_fat \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" Hamburger Patty \n",
" 220 \n",
" 17 \n",
" 8 \n",
" 1 \n",
" 60 \n",
" 50 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 16 \n",
" Five Guys \n",
" NaN \n",
" NaN \n",
" \n",
" \n",
" 1 \n",
" Hot Dog \n",
" 240 \n",
" 20 \n",
" 9 \n",
" 1 \n",
" 45 \n",
" 1040 \n",
" 2 \n",
" 0 \n",
" 1 \n",
" 11 \n",
" Five Guys \n",
" NaN \n",
" NaN \n",
" \n",
" \n",
" 2 \n",
" Five Guys Bun \n",
" 260 \n",
" 9 \n",
" 3.5 \n",
" 0 \n",
" 5 \n",
" 330 \n",
" 39 \n",
" 2 \n",
" 8 \n",
" 7 \n",
" Five Guys \n",
" NaN \n",
" NaN \n",
" \n",
" \n",
" 3 \n",
" Little \n",
" 526 \n",
" 23 \n",
" 4 \n",
" 0.5 \n",
" 0 \n",
" 531 \n",
" 72 \n",
" 8 \n",
" 2 \n",
" 8 \n",
" Five Guys \n",
" NaN \n",
" NaN \n",
" \n",
" \n",
" 4 \n",
" Regular \n",
" 953 \n",
" 41 \n",
" 7 \n",
" 1 \n",
" 0 \n",
" 962 \n",
" 131 \n",
" 15 \n",
" 4 \n",
" 15 \n",
" Five Guys \n",
" NaN \n",
" NaN \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 581 \n",
" Spicy Triple Double Crunchwrap \n",
" 780 \n",
" 38 \n",
" NaN \n",
" 0.5 \n",
" 50 \n",
" 1850 \n",
" 87 \n",
" 9 \n",
" 8 \n",
" 23 \n",
" Taco Bell \n",
" 10 \n",
" 340.0 \n",
" \n",
" \n",
" 582 \n",
" Express Taco Salad w/ Chips \n",
" 580 \n",
" 29 \n",
" NaN \n",
" 1 \n",
" 60 \n",
" 1270 \n",
" 59 \n",
" 8 \n",
" 7 \n",
" 23 \n",
" Taco Bell \n",
" 9 \n",
" 260.0 \n",
" \n",
" \n",
" 583 \n",
" Fiesta Taco Salad-Beef \n",
" 780 \n",
" 42 \n",
" NaN \n",
" 1 \n",
" 60 \n",
" 1340 \n",
" 74 \n",
" 11 \n",
" 7 \n",
" 26 \n",
" Taco Bell \n",
" 10 \n",
" 380.0 \n",
" \n",
" \n",
" 584 \n",
" Fiesta Taco Salad-Chicken \n",
" 720 \n",
" 35 \n",
" NaN \n",
" 0 \n",
" 70 \n",
" 1260 \n",
" 70 \n",
" 8 \n",
" 8 \n",
" 32 \n",
" Taco Bell \n",
" 7 \n",
" 320.0 \n",
" \n",
" \n",
" 585 \n",
" Fiesta Taco Salad-Steak \n",
" 720 \n",
" 36 \n",
" NaN \n",
" 1 \n",
" 55 \n",
" 1340 \n",
" 70 \n",
" 8 \n",
" 8 \n",
" 28 \n",
" Taco Bell \n",
" 8 \n",
" 320.0 \n",
" \n",
" \n",
"
\n",
"
586 rows × 14 columns
\n",
"
"
],
"text/plain": [
" item calories total_fat saturated_fat \\\n",
"0 Hamburger Patty 220 17 8 \n",
"1 Hot Dog 240 20 9 \n",
"2 Five Guys Bun 260 9 3.5 \n",
"3 Little 526 23 4 \n",
"4 Regular 953 41 7 \n",
".. ... ... ... ... \n",
"581 Spicy Triple Double Crunchwrap 780 38 NaN \n",
"582 Express Taco Salad w/ Chips 580 29 NaN \n",
"583 Fiesta Taco Salad-Beef 780 42 NaN \n",
"584 Fiesta Taco Salad-Chicken 720 35 NaN \n",
"585 Fiesta Taco Salad-Steak 720 36 NaN \n",
"\n",
" trans_fat cholesterol sodium total_carb fiber sugar protein restaurant \\\n",
"0 1 60 50 0 0 0 16 Five Guys \n",
"1 1 45 1040 2 0 1 11 Five Guys \n",
"2 0 5 330 39 2 8 7 Five Guys \n",
"3 0.5 0 531 72 8 2 8 Five Guys \n",
"4 1 0 962 131 15 4 15 Five Guys \n",
".. ... ... ... ... ... ... ... ... \n",
"581 0.5 50 1850 87 9 8 23 Taco Bell \n",
"582 1 60 1270 59 8 7 23 Taco Bell \n",
"583 1 60 1340 74 11 7 26 Taco Bell \n",
"584 0 70 1260 70 8 8 32 Taco Bell \n",
"585 1 55 1340 70 8 8 28 Taco Bell \n",
"\n",
" sat_fat cal_fat \n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
".. ... ... \n",
"581 10 340.0 \n",
"582 9 260.0 \n",
"583 10 380.0 \n",
"584 7 320.0 \n",
"585 8 320.0 \n",
"\n",
"[586 rows x 14 columns]"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_all_ff = pd.concat([five_guys_df, df_bgood_ff], axis=0, ignore_index=True)\n",
"\n",
"df_all_ff"
]
},
{
"cell_type": "code",
"execution_count": 105,
"id": "00745809",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Five Guys', 'BGood', 'Mcdonalds', 'Chick Fil-A', 'Sonic', 'Arbys',\n",
" 'Burger King', 'Dairy Queen', 'Subway', 'Taco Bell'], dtype=object)"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_all_ff.restaurant.unique()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33573791",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "eddb15fd",
"metadata": {},
"source": [
"# Visualization "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "84bacee6",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
":4: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.\n",
" dfg = df_ff.groupby(['restaurant'])['calories','sodium','protein'].mean()\n"
]
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Segmented Bar Chart \n",
"\n",
"df_ff = df_ff.replace(np.nan, 0)\n",
"dfg = df_ff.groupby(['restaurant'])['calories','sodium','protein'].mean()\n",
"\n",
"dfg.plot(kind='bar', title='Restaurant Mean Nutritional Values', ylabel='Mean Values(g)',\n",
" xlabel='Restaurants', figsize=(6, 5))\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "dd3b84a7",
"metadata": {},
"source": [
"* The segmented barchart is a good tool to show stark differences between specific nutritional categproes on a restaurant to restaurant basis. For example, the visualization above shows that Arby's is the leader in average sodium in menu items, where as Chick-Fil-A is the leader in calorie deficit menu items. "
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "3166bce5",
"metadata": {},
"outputs": [],
"source": [
"# Regression Functions \n",
"\n",
"def get_mse(y_true, y_pred):\n",
" # calculate the mean squared distance between the predicted and actual y\n",
" return np.mean((y_pred - y_true) ** 2)\n",
"\n",
"def show_fit(x, y, slope, intercept):\n",
" plt.figure()\n",
" \n",
" # transform the input data into numpy arrays and flatten them for easier processing\n",
" x = np.array(x).ravel()\n",
" y = np.array(y).ravel()\n",
" \n",
" # plot the actual data\n",
" plt.scatter(x, y, label='data')\n",
" \n",
" # compute linear predictions \n",
" # x is a numpy array so each element gets mulitplied by slope and intercept is added\n",
" y_pred = slope * x + intercept\n",
" \n",
" # plot the linear fit\n",
" plt.plot(x, y_pred, color='black',\n",
" ls=':',\n",
" label='linear fit')\n",
" \n",
" # for each data point plot the error\n",
" for idx, (x_i, y_i) in enumerate(zip(x, y)):\n",
" plt.plot([x_i, x_i], [y_i, slope * x_i + intercept], \n",
" ls='--', lw=3, color='tab:red',\n",
" label='error' if idx == 0 else \"\")\n",
" \n",
" plt.legend()\n",
" \n",
" plt.xlabel('x')\n",
" plt.ylabel('y')\n",
" \n",
" # print the mean squared error\n",
" y_pred = slope * x + intercept\n",
" mse = get_mse(y_true=y, y_pred=y_pred)\n",
" plt.suptitle(f'y_hat = {slope:.2f} * x + {intercept:.2f}, MSE = {mse:.3f}')\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "e29128bd",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Regressing between two given nutritional categories \n",
"\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import r2_score\n",
"\n",
"x = df_ff['calories'].to_numpy()\n",
"y = df_ff['cholesterol']\n",
"\n",
"x = x.reshape((-1, 1))\n",
"\n",
"reg = LinearRegression()\n",
"reg.fit(x, y) \n",
"slope = reg.coef_[0]\n",
"intercept = reg.intercept_\n",
"\n",
"show_fit(x, y, slope, intercept)"
]
},
{
"cell_type": "markdown",
"id": "b7dcf6bd",
"metadata": {},
"source": [
"* The scatterplot above is a foundation for running a machine learning regression based model. This chart can also be used to demonstrate levels of correlation between select nutritional categories. For instance, the chart above shows that there is a relatively high correlation between calories and cholesterol (with exception to a few outliers). "
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "ae827eb3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R2 = 0.5813448793142899\n"
]
}
],
"source": [
"y_pred = reg.predict(x) \n",
"\n",
"# computing R2 from sklearn\n",
"r2 = r2_score(y_true=y, y_pred=y_pred)\n",
"print('R2 =',r2)"
]
},
{
"cell_type": "markdown",
"id": "4eaeb940",
"metadata": {},
"source": [
"# Discussion of Machine Learning Tools"
]
},
{
"cell_type": "markdown",
"id": "f60c4dfa",
"metadata": {},
"source": [
"* Machine Learning Tools: SciKit (K-nearest neighbors, K-means) and SKLearn. We believe k-nearest neighbors make the most sense because groups are already made through restaurant identification. K-means, on the other hand, is used to identify groups that are yet to exist. Another option is potentially using linear regression on certain nutritional value categories to identify correlations among groups. \n",
"\n",
"\n",
"* Relevant assumptions: For k-nearest neighbors, an assumption is made that individual menu items are connected and represent a specific restaurant and can be used as an identifier for further analysis.\n",
"\n",
"\n",
"* Use cases: We could build a new food item recommendor based on favorite menu items and/or specifications on ranges of preferred nutritional values. Businesses could also use this tool to compare the nutritional value of their items with their competitors and modify their menus according to their strategic goals. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f9c78db",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}