Food waste is a massive problem in our country, with nearly 1/3 of all food by weight being thrown out– that’s nearly 20.3 tons of food waste per year! This discarded food matter ends up in landfills and incarceration facilities, where it generates greenhouse gas emissions as it decomposes and burns. Beyond this, spoiled food is also a cause of financial loss for many buyers, who may not have been able to use the items they bought during their shelf life . As an environmentalist as well as a college student on a tight budget, I propose that we find a way to reduce this issue for the benefit of not only the planet, but our bank accounts as well.
Food waste happens at many different levels, with much occurring prior to reaching the store shelves; problems during harvesting, manufacturing, processing, and transportation all contribute to the issue at large. However, as consumers, we can minimize our contribution to this problem by ensuring proper usage of purchased products once they reach our hands.
This can be accomplished by providing buyers with a tool to maximize their fridge inventory, with the goal of preparing food so that it generates less waste and is more cost-effective! This proposal will discuss methods for achieving this goal, including providing users with suggestions for creative meals that are cheap and nutritious– and just so happen to use up those last few ingredients from their most recent grocery store trip.
The following dataset summarizes the contents of food.com, including 180,000 recipes published on the site up until 2019, along with all reviews for each recipe. For the purposes of this project, the data we’re interested in belongs to three separate files, as outlined below:
Attributes that would be used for this project:
Id
: the identification code of the recipe, as it appears on food.comIngredient_ids
: list of identification codes which correspond to unique ingredients used in the recipeimport pandas as pd
recipe_data = pd.read_csv('PP_recipes.csv')
recipe_data.head()
id | i | name_tokens | ingredient_tokens | steps_tokens | techniques | calorie_level | ingredient_ids | |
---|---|---|---|---|---|---|---|---|
0 | 424415 | 23 | [40480, 37229, 2911, 1019, 249, 6878, 6878, 28... | [[2911, 1019, 249, 6878], [1353], [6953], [153... | [40480, 40482, 21662, 481, 6878, 500, 246, 161... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ... | 0 | [389, 7655, 6270, 1527, 3406] |
1 | 146223 | 96900 | [40480, 18376, 7056, 246, 1531, 2032, 40481] | [[17918], [25916], [2507, 6444], [8467, 1179],... | [40480, 40482, 729, 2525, 10906, 485, 43, 8393... | [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ... | 0 | [2683, 4969, 800, 5298, 840, 2499, 6632, 7022,... |
2 | 312329 | 120056 | [40480, 21044, 16954, 8294, 556, 10837, 40481] | [[5867, 24176], [1353], [6953], [1301, 11332],... | [40480, 40482, 8240, 481, 24176, 296, 1353, 66... | [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ... | 1 | [1257, 7655, 6270, 590, 5024, 1119, 4883, 6696... |
3 | 74301 | 168258 | [40480, 10025, 31156, 40481] | [[1270, 1645, 28447], [21601], [27952, 29471, ... | [40480, 40482, 5539, 21601, 1073, 903, 2324, 4... | [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | 0 | [7940, 3609, 7060, 6265, 1170, 6654, 5003, 3561] |
4 | 76272 | 109030 | [40480, 17841, 252, 782, 2373, 1641, 2373, 252... | [[1430, 11434], [1430, 17027], [1615, 23, 695,... | [40480, 40482, 14046, 1430, 11434, 488, 17027,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ... | 0 | [3484, 6324, 7594, 243] |
Data can be cleaned up to generate a list of unique IDs and the ingredients they correspond to.
Attributes that would be used for this project:
Ingredient
: common name of kitchen ingredient
Ingr_id
: unique identification code that corresponds to the ingredient listed, codes used to access dataset above
import pickle
import numpy as np
# Read in data
with open('ingr_map.pkl', 'rb') as f:
ingr_data = pickle.load(f)
ingr_data.head()
raw_ingr | raw_words | processed | len_proc | replaced | count | id | |
---|---|---|---|---|---|---|---|
0 | medium heads bibb or red leaf lettuce, washed,... | 13 | medium heads bibb or red leaf lettuce, washed,... | 73 | lettuce | 4507 | 4308 |
1 | mixed baby lettuces and spring greens | 6 | mixed baby lettuces and spring green | 36 | lettuce | 4507 | 4308 |
2 | romaine lettuce leaf | 3 | romaine lettuce leaf | 20 | lettuce | 4507 | 4308 |
3 | iceberg lettuce leaf | 3 | iceberg lettuce leaf | 20 | lettuce | 4507 | 4308 |
4 | red romaine lettuce | 3 | red romaine lettuce | 19 | lettuce | 4507 | 4308 |
# Create a series of unique ingredient names
ingredients_series = pd.Series(ingr_data['replaced'])
all_ingredients = ingredients_series.unique()
# Take a look at the data:
all_ingredients[100:105]
array(['kosher salt & ground black pepper', 'cream of broccoli soup', 'lemon frosting', 'roasted red peppers packed in oil', 'ranch dips mix'], dtype=object)
# Combine with original IDs:
id_series = pd.Series(ingr_data['id'])
all_ids = id_series.unique()
ingredient_id_dict = {'Ingr_ID': all_ids,'Ingredient': all_ingredients }
# Convert to dataframe
ingredient_ids = pd.DataFrame(ingredient_id_dict)
# Take a look at the data!
ingredient_ids[905:910]
Ingr_ID | Ingredient | |
---|---|---|
905 | 5694 | powdered soy protein concentrate |
906 | 299 | bacon bit |
907 | 5412 | pineapple chunks in juice |
908 | 2272 | dried great northern bean |
909 | 1982 | crushed pineapple |
Despite much searching, I wasn’t able to find a dataset with a comprehensive list of unit prices for common ingredients, such as produce, grains, or condiments. If we can track this down, this opens up a whole new door to analysis between the cost per meal and its nutritional value!
RAW_recipes.csv provides data on the nutritional value of each recipe.
Here’s a look at the data we’d be interested in– the name of the recipe, its ID code on food.com, and its nutritional value.These are listed as a percentage of recommended daily values.
Attributes that would be used for this project:
Name
: name of recipe as it appears on food.comId
: the identification code of the recipe on the siteNutrition
: a list of the nutritional values for a serving of that recipe# Read in and preview data
recipe_data = pd.read_csv('RAW_recipes.csv')
recipe_data.head(2)
name | id | minutes | contributor_id | submitted | tags | nutrition | n_steps | steps | description | ingredients | n_ingredients | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | arriba baked winter squash mexican style | 137739 | 55 | 47892 | 2005-09-16 | ['60-minutes-or-less', 'time-to-make', 'course... | [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0] | 11 | ['make a choice and proceed with recipe', 'dep... | autumn is my favorite time of year to cook! th... | ['winter squash', 'mexican seasoning', 'mixed ... | 7 |
1 | a bit different breakfast pizza | 31490 | 30 | 26278 | 2002-06-17 | ['30-minutes-or-less', 'time-to-make', 'course... | [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0] | 9 | ['preheat oven to 425 degrees f', 'press dough... | this recipe calls for the crust to be prebaked... | ['prepared pizza crust', 'sausage patty', 'egg... | 6 |
# Omit extraneous information
nutrition_data = recipe_data[['name', 'id', 'nutrition']]
nutrition_data.head()
name | id | nutrition | |
---|---|---|---|
0 | arriba baked winter squash mexican style | 137739 | [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0] |
1 | a bit different breakfast pizza | 31490 | [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0] |
2 | all in the kitchen chili | 112140 | [269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0] |
3 | alouette potatoes | 59389 | [368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0] |
4 | amish tomato ketchup for canning | 44061 | [352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0] |
We can create an interface in which users can input the names of ingredients which they are trying to use up, and can receive a list of recipes that use these ingredients together.
Furthering this idea, if unit cost data becomes available for ingredients, this can be paired with nutrition information to recommend the most cost-effective sources of each major nutritional macromolecule (carbohydrates, protein, and fats).