{ "cells": [ { "cell_type": "markdown", "id": "9c8fe017", "metadata": {}, "source": [ "# Central Motivation\n", "\n", "How different are fast food places from each other? Many people swear by their favorite restaurant, but can this by justified by emprirical factors, such as nutrition facts? Is there a big difference between the fast food chains that claim specializations in certain food products, such as burger places vs. chicken places vs. \"healthy\" options? Our program is exploratory by nature, seeking find the differentiating nutritional factors between varying fast food chains. The applications range from helping businesses gain insight on how their menus vary from their competitors to recommending similar food items." ] }, { "cell_type": "markdown", "id": "7905a458", "metadata": {}, "source": [ "# Summary of Data Processing Pipeline " ] }, { "cell_type": "markdown", "id": "40e68a9d", "metadata": {}, "source": [ "Data Sources from multiple sources:\n", "\n", "## Get Data Sources\n", "1. From PDF converted to Xcel (BGOOD) (converted using ADOBE) --> merged\n", "\n", "\n", "2. Webscraping \n", "\n", "\n", "3. CSV file of fast food nutrition facts downloaded from [OpenIntro](https://www.openintro.org/data/index.php?data=fastfood)\n", "\n", "\n", "4. From PDF to CSV (In and Out Burger)\n", "\n", "\n", "## Merge Data Sources\n", "1. Merge all our data sources into one dataframe\n", "\n", "\n", "## Clean Data\n", "1. In the pandas DF, we need to further clean and account for missing values" ] }, { "cell_type": "markdown", "id": "cda11570", "metadata": {}, "source": [ "# Obtain, Clean, and Merge Data Sources " ] }, { "cell_type": "code", "execution_count": 98, "id": "e766376b", "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "import requests\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import plotly\n", "import requests" ] }, { "cell_type": "code", "execution_count": 99, "id": "ccf3797c", "metadata": {}, "outputs": [], "source": [ "def get_url(url):\n", " \"\"\" gets the html string of the url\n", "\n", " Args:\n", " url (str): website url\n", "\n", " Returns:\n", " html_str (str): html of website\n", " \"\"\"\n", " html_str = requests.get(url).text\n", " \n", " return html_str" ] }, { "cell_type": "code", "execution_count": 100, "id": "2895fbc0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
restaurantitemcaloriescal_fattotal_fatsat_fattrans_fatcholesterolsodiumtotal_carbfibersugarprotein
0McdonaldsArtisan Grilled Chicken Sandwich3806072.00.0951110443.01137.0
1McdonaldsSingle Bacon Smokehouse Burger8404104517.01.51301580622.01846.0
2McdonaldsDouble Bacon Smokehouse Burger11306006727.03.02201920633.01870.0
3McdonaldsGrilled Bacon Smokehouse Chicken Sandwich7502803110.00.51551940622.01855.0
4McdonaldsCrispy Bacon Smokehouse Chicken Sandwich9204104512.00.51201980814.01846.0
\n", "
" ], "text/plain": [ " restaurant item calories cal_fat \\\n", "0 Mcdonalds Artisan Grilled Chicken Sandwich 380 60 \n", "1 Mcdonalds Single Bacon Smokehouse Burger 840 410 \n", "2 Mcdonalds Double Bacon Smokehouse Burger 1130 600 \n", "3 Mcdonalds Grilled Bacon Smokehouse Chicken Sandwich 750 280 \n", "4 Mcdonalds Crispy Bacon Smokehouse Chicken Sandwich 920 410 \n", "\n", " total_fat sat_fat trans_fat cholesterol sodium total_carb fiber \\\n", "0 7 2.0 0.0 95 1110 44 3.0 \n", "1 45 17.0 1.5 130 1580 62 2.0 \n", "2 67 27.0 3.0 220 1920 63 3.0 \n", "3 31 10.0 0.5 155 1940 62 2.0 \n", "4 45 12.0 0.5 120 1980 81 4.0 \n", "\n", " sugar protein \n", "0 11 37.0 \n", "1 18 46.0 \n", "2 18 70.0 \n", "3 18 55.0 \n", "4 18 46.0 " ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Open Intro Menu Data\n", "df_ff = pd.read_csv('fastfood.csv')\n", "df_ff.drop(columns=['salad', 'calcium', 'vit_a', 'vit_c'], inplace=True)\n", "df_ff.head()" ] }, { "cell_type": "code", "execution_count": 101, "id": "39942153", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemcaloriestotal_fatsat_fattrans_fatcholesterolsodiumtotal_carbfibersugarproteinrestaurant
0The Classic59040.29.1187977333423BGood
1West Side55044.69.91581294459925BGood
2The Farmhouse72056.116.911301174342437BGood
3The Freebird78552.815.70166991363543BGood
4Power Play64047.116.513501281387441BGood
\n", "
" ], "text/plain": [ " item calories total_fat sat_fat trans_fat cholesterol sodium \\\n", "0 The Classic 590 40.2 9.1 1 87 977 \n", "1 West Side 550 44.6 9.9 1 58 1294 \n", "2 The Farmhouse 720 56.1 16.9 1 130 1174 \n", "3 The Freebird 785 52.8 15.7 0 166 991 \n", "4 Power Play 640 47.1 16.5 1 350 1281 \n", "\n", " total_carb fiber sugar protein restaurant \n", "0 33 3 4 23 BGood \n", "1 45 9 9 25 BGood \n", "2 34 2 4 37 BGood \n", "3 36 3 5 43 BGood \n", "4 38 7 4 41 BGood " ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# B GOOD Data\n", "df_bgood = pd.read_csv('bgood_menu_only.csv')\n", "df_bgood['restaurant'] = 'BGood'\n", "\n", "\n", "df_bgood.rename(columns = {'Craft Burgers':'item', \n", " 'Calories':'calories',\n", " 'Total Fat (g)':'total_fat',\n", " 'Saturated Fat (g)':'sat_fat',\n", " 'Trans Fat (g)':'trans_fat',\n", " 'Cholesterol (mg)':'cholesterol',\n", " 'Sodium (mg)': 'sodium',\n", " 'Total Carbohydrate (g)':'total_carb',\n", " 'Fiber (g)': 'fiber',\n", " 'Sugars (g)': 'sugar',\n", " 'Protein (g)': 'protein'}, inplace = True)\n", "\n", "df_bgood.drop(['Unnamed: 3', 'Unnamed: 5', 'Unnamed: 7', \n", " 'Unnamed: 9', 'Unnamed: 11','Unnamed: 13', \n", " 'Unnamed: 14','Unnamed: 16'], axis = 1, inplace = True)\n", "\n", "df_bgood.head()" ] }, { "cell_type": "code", "execution_count": 102, "id": "0175117c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemcaloriestotal_fatsat_fattrans_fatcholesterolsodiumtotal_carbfibersugarproteinrestaurantcal_fat
0The Classic59040.29.1187977333423BGoodNaN
1West Side55044.69.91581294459925BGoodNaN
2The Farmhouse72056.116.911301174342437BGoodNaN
3The Freebird78552.815.70166991363543BGoodNaN
4Power Play64047.116.513501281387441BGoodNaN
..........................................
510Spicy Triple Double Crunchwrap78038100.5501850879823Taco Bell340.0
511Express Taco Salad w/ Chips5802991601270598723Taco Bell260.0
512Fiesta Taco Salad-Beef780421016013407411726Taco Bell380.0
513Fiesta Taco Salad-Chicken7203570701260708832Taco Bell320.0
514Fiesta Taco Salad-Steak7203681551340708828Taco Bell320.0
\n", "

571 rows × 13 columns

\n", "
" ], "text/plain": [ " item calories total_fat sat_fat trans_fat \\\n", "0 The Classic 590 40.2 9.1 1 \n", "1 West Side 550 44.6 9.9 1 \n", "2 The Farmhouse 720 56.1 16.9 1 \n", "3 The Freebird 785 52.8 15.7 0 \n", "4 Power Play 640 47.1 16.5 1 \n", ".. ... ... ... ... ... \n", "510 Spicy Triple Double Crunchwrap 780 38 10 0.5 \n", "511 Express Taco Salad w/ Chips 580 29 9 1 \n", "512 Fiesta Taco Salad-Beef 780 42 10 1 \n", "513 Fiesta Taco Salad-Chicken 720 35 7 0 \n", "514 Fiesta Taco Salad-Steak 720 36 8 1 \n", "\n", " cholesterol sodium total_carb fiber sugar protein restaurant cal_fat \n", "0 87 977 33 3 4 23 BGood NaN \n", "1 58 1294 45 9 9 25 BGood NaN \n", "2 130 1174 34 2 4 37 BGood NaN \n", "3 166 991 36 3 5 43 BGood NaN \n", "4 350 1281 38 7 4 41 BGood NaN \n", ".. ... ... ... ... ... ... ... ... \n", "510 50 1850 87 9 8 23 Taco Bell 340.0 \n", "511 60 1270 59 8 7 23 Taco Bell 260.0 \n", "512 60 1340 74 11 7 26 Taco Bell 380.0 \n", "513 70 1260 70 8 8 32 Taco Bell 320.0 \n", "514 55 1340 70 8 8 28 Taco Bell 320.0 \n", "\n", "[571 rows x 13 columns]" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_bgood_ff = pd.concat([df_bgood, df_ff])\n", "df_bgood_ff" ] }, { "cell_type": "code", "execution_count": 103, "id": "cc95306c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemcaloriestotal_fatsaturated_fattrans_fatcholesterolsodiumtotal_carbfibersugarproteinrestaurant
4Hamburger Patty2201781605000016Five Guys
5Hot Dog240209145104020111Five Guys
7Five Guys Bun26093.50533039287Five Guys
9Little5262340.5053172828Five Guys
10Regular9534171096213115415Five Guys
11Large1314571010132718121620Five Guys
13A.1® Sauce1500002803020Five Guys
15Cheese*# (1 slice)70640.2153600004Five Guys
17Grilled Mushrooms50000551010Five Guys
18Hot Sauce000002000000Five Guys
20Ketchup2000001605040Five Guys
22Mayonnaise100112010750000Five Guys
23Mustard00000550000Five Guys
25Pickles300002581000Five Guys
26Relish1000001053030Five Guys
\n", "
" ], "text/plain": [ " item calories total_fat saturated_fat trans_fat cholesterol \\\n", "4 Hamburger Patty 220 17 8 1 60 \n", "5 Hot Dog 240 20 9 1 45 \n", "7 Five Guys Bun 260 9 3.5 0 5 \n", "9 Little 526 23 4 0.5 0 \n", "10 Regular 953 41 7 1 0 \n", "11 Large 1314 57 10 1 0 \n", "13 A.1® Sauce 15 0 0 0 0 \n", "15 Cheese*# (1 slice) 70 6 4 0.2 15 \n", "17 Grilled Mushrooms 5 0 0 0 0 \n", "18 Hot Sauce 0 0 0 0 0 \n", "20 Ketchup 20 0 0 0 0 \n", "22 Mayonnaise 100 11 2 0 10 \n", "23 Mustard 0 0 0 0 0 \n", "25 Pickles 3 0 0 0 0 \n", "26 Relish 10 0 0 0 0 \n", "\n", " sodium total_carb fiber sugar protein restaurant \n", "4 50 0 0 0 16 Five Guys \n", "5 1040 2 0 1 11 Five Guys \n", "7 330 39 2 8 7 Five Guys \n", "9 531 72 8 2 8 Five Guys \n", "10 962 131 15 4 15 Five Guys \n", "11 1327 181 21 6 20 Five Guys \n", "13 280 3 0 2 0 Five Guys \n", "15 360 0 0 0 4 Five Guys \n", "17 55 1 0 1 0 Five Guys \n", "18 200 0 0 0 0 Five Guys \n", "20 160 5 0 4 0 Five Guys \n", "22 75 0 0 0 0 Five Guys \n", "23 55 0 0 0 0 Five Guys \n", "25 258 1 0 0 0 Five Guys \n", "26 105 3 0 3 0 Five Guys " ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# B GOOD Data\n", "five_guys_df = pd.read_csv('five_guys.csv', \n", " names=['item','serving_size','calories', 'total_fat', \n", " 'saturated_fat', 'trans_fat', 'cholesterol', 'sodium',\n", " 'total_carb', 'fiber', 'sugar', 'protein'])\n", "five_guys_df.dropna(inplace=True)\n", "five_guys_df = five_guys_df[1:] \n", "five_guys_df['restaurant'] = 'Five Guys'\n", "five_guys_df.drop(columns='serving_size', inplace=True)\n", "five_guys_df\n" ] }, { "cell_type": "code", "execution_count": 104, "id": "50312a17", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemcaloriestotal_fatsaturated_fattrans_fatcholesterolsodiumtotal_carbfibersugarproteinrestaurantsat_fatcal_fat
0Hamburger Patty2201781605000016Five GuysNaNNaN
1Hot Dog240209145104020111Five GuysNaNNaN
2Five Guys Bun26093.50533039287Five GuysNaNNaN
3Little5262340.5053172828Five GuysNaNNaN
4Regular9534171096213115415Five GuysNaNNaN
.............................................
581Spicy Triple Double Crunchwrap78038NaN0.5501850879823Taco Bell10340.0
582Express Taco Salad w/ Chips58029NaN1601270598723Taco Bell9260.0
583Fiesta Taco Salad-Beef78042NaN16013407411726Taco Bell10380.0
584Fiesta Taco Salad-Chicken72035NaN0701260708832Taco Bell7320.0
585Fiesta Taco Salad-Steak72036NaN1551340708828Taco Bell8320.0
\n", "

586 rows × 14 columns

\n", "
" ], "text/plain": [ " item calories total_fat saturated_fat \\\n", "0 Hamburger Patty 220 17 8 \n", "1 Hot Dog 240 20 9 \n", "2 Five Guys Bun 260 9 3.5 \n", "3 Little 526 23 4 \n", "4 Regular 953 41 7 \n", ".. ... ... ... ... \n", "581 Spicy Triple Double Crunchwrap 780 38 NaN \n", "582 Express Taco Salad w/ Chips 580 29 NaN \n", "583 Fiesta Taco Salad-Beef 780 42 NaN \n", "584 Fiesta Taco Salad-Chicken 720 35 NaN \n", "585 Fiesta Taco Salad-Steak 720 36 NaN \n", "\n", " trans_fat cholesterol sodium total_carb fiber sugar protein restaurant \\\n", "0 1 60 50 0 0 0 16 Five Guys \n", "1 1 45 1040 2 0 1 11 Five Guys \n", "2 0 5 330 39 2 8 7 Five Guys \n", "3 0.5 0 531 72 8 2 8 Five Guys \n", "4 1 0 962 131 15 4 15 Five Guys \n", ".. ... ... ... ... ... ... ... ... \n", "581 0.5 50 1850 87 9 8 23 Taco Bell \n", "582 1 60 1270 59 8 7 23 Taco Bell \n", "583 1 60 1340 74 11 7 26 Taco Bell \n", "584 0 70 1260 70 8 8 32 Taco Bell \n", "585 1 55 1340 70 8 8 28 Taco Bell \n", "\n", " sat_fat cal_fat \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", ".. ... ... \n", "581 10 340.0 \n", "582 9 260.0 \n", "583 10 380.0 \n", "584 7 320.0 \n", "585 8 320.0 \n", "\n", "[586 rows x 14 columns]" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_all_ff = pd.concat([five_guys_df, df_bgood_ff], axis=0, ignore_index=True)\n", "\n", "df_all_ff" ] }, { "cell_type": "code", "execution_count": 105, "id": "00745809", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Five Guys', 'BGood', 'Mcdonalds', 'Chick Fil-A', 'Sonic', 'Arbys',\n", " 'Burger King', 'Dairy Queen', 'Subway', 'Taco Bell'], dtype=object)" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_all_ff.restaurant.unique()" ] }, { "cell_type": "code", "execution_count": null, "id": "33573791", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "eddb15fd", "metadata": {}, "source": [ "# Visualization " ] }, { "cell_type": "code", "execution_count": 8, "id": "84bacee6", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":4: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.\n", " dfg = df_ff.groupby(['restaurant'])['calories','sodium','protein'].mean()\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Segmented Bar Chart \n", "\n", "df_ff = df_ff.replace(np.nan, 0)\n", "dfg = df_ff.groupby(['restaurant'])['calories','sodium','protein'].mean()\n", "\n", "dfg.plot(kind='bar', title='Restaurant Mean Nutritional Values', ylabel='Mean Values(g)',\n", " xlabel='Restaurants', figsize=(6, 5))\n", "\n" ] }, { "cell_type": "markdown", "id": "dd3b84a7", "metadata": {}, "source": [ "* The segmented barchart is a good tool to show stark differences between specific nutritional categproes on a restaurant to restaurant basis. For example, the visualization above shows that Arby's is the leader in average sodium in menu items, where as Chick-Fil-A is the leader in calorie deficit menu items. " ] }, { "cell_type": "code", "execution_count": 29, "id": "3166bce5", "metadata": {}, "outputs": [], "source": [ "# Regression Functions \n", "\n", "def get_mse(y_true, y_pred):\n", " # calculate the mean squared distance between the predicted and actual y\n", " return np.mean((y_pred - y_true) ** 2)\n", "\n", "def show_fit(x, y, slope, intercept):\n", " plt.figure()\n", " \n", " # transform the input data into numpy arrays and flatten them for easier processing\n", " x = np.array(x).ravel()\n", " y = np.array(y).ravel()\n", " \n", " # plot the actual data\n", " plt.scatter(x, y, label='data')\n", " \n", " # compute linear predictions \n", " # x is a numpy array so each element gets mulitplied by slope and intercept is added\n", " y_pred = slope * x + intercept\n", " \n", " # plot the linear fit\n", " plt.plot(x, y_pred, color='black',\n", " ls=':',\n", " label='linear fit')\n", " \n", " # for each data point plot the error\n", " for idx, (x_i, y_i) in enumerate(zip(x, y)):\n", " plt.plot([x_i, x_i], [y_i, slope * x_i + intercept], \n", " ls='--', lw=3, color='tab:red',\n", " label='error' if idx == 0 else \"\")\n", " \n", " plt.legend()\n", " \n", " plt.xlabel('x')\n", " plt.ylabel('y')\n", " \n", " # print the mean squared error\n", " y_pred = slope * x + intercept\n", " mse = get_mse(y_true=y, y_pred=y_pred)\n", " plt.suptitle(f'y_hat = {slope:.2f} * x + {intercept:.2f}, MSE = {mse:.3f}')\n", " " ] }, { "cell_type": "code", "execution_count": 30, "id": "e29128bd", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Regressing between two given nutritional categories \n", "\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import r2_score\n", "\n", "x = df_ff['calories'].to_numpy()\n", "y = df_ff['cholesterol']\n", "\n", "x = x.reshape((-1, 1))\n", "\n", "reg = LinearRegression()\n", "reg.fit(x, y) \n", "slope = reg.coef_[0]\n", "intercept = reg.intercept_\n", "\n", "show_fit(x, y, slope, intercept)" ] }, { "cell_type": "markdown", "id": "b7dcf6bd", "metadata": {}, "source": [ "* The scatterplot above is a foundation for running a machine learning regression based model. This chart can also be used to demonstrate levels of correlation between select nutritional categories. For instance, the chart above shows that there is a relatively high correlation between calories and cholesterol (with exception to a few outliers). " ] }, { "cell_type": "code", "execution_count": 24, "id": "ae827eb3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R2 = 0.5813448793142899\n" ] } ], "source": [ "y_pred = reg.predict(x) \n", "\n", "# computing R2 from sklearn\n", "r2 = r2_score(y_true=y, y_pred=y_pred)\n", "print('R2 =',r2)" ] }, { "cell_type": "markdown", "id": "4eaeb940", "metadata": {}, "source": [ "# Discussion of Machine Learning Tools" ] }, { "cell_type": "markdown", "id": "f60c4dfa", "metadata": {}, "source": [ "* Machine Learning Tools: SciKit (K-nearest neighbors, K-means) and SKLearn. We believe k-nearest neighbors make the most sense because groups are already made through restaurant identification. K-means, on the other hand, is used to identify groups that are yet to exist. Another option is potentially using linear regression on certain nutritional value categories to identify correlations among groups. \n", "\n", "\n", "* Relevant assumptions: For k-nearest neighbors, an assumption is made that individual menu items are connected and represent a specific restaurant and can be used as an identifier for further analysis.\n", "\n", "\n", "* Use cases: We could build a new food item recommendor based on favorite menu items and/or specifications on ranges of preferred nutritional values. Businesses could also use this tool to compare the nutritional value of their items with their competitors and modify their menus according to their strategic goals. " ] }, { "cell_type": "code", "execution_count": null, "id": "3f9c78db", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 5 }