Reducing Food Waste with Creative Recipes¶

Problem: Food Waste¶

Food waste is a massive problem in our country, with nearly 1/3 of all food by weight being thrown out– that’s nearly 20.3 tons of food waste per year! This discarded food matter ends up in landfills and incarceration facilities, where it generates greenhouse gas emissions as it decomposes and burns. Beyond this, spoiled food is also a cause of financial loss for many buyers, who may not have been able to use the items they bought during their shelf life . As an environmentalist as well as a college student on a tight budget, I propose that we find a way to reduce this issue for the benefit of not only the planet, but our bank accounts as well.

Food waste happens at many different levels, with much occurring prior to reaching the store shelves; problems during harvesting, manufacturing, processing, and transportation all contribute to the issue at large. However, as consumers, we can minimize our contribution to this problem by ensuring proper usage of purchased products once they reach our hands.

This can be accomplished by providing buyers with a tool to maximize their fridge inventory, with the goal of preparing food so that it generates less waste and is more cost-effective! This proposal will discuss methods for achieving this goal, including providing users with suggestions for creative meals that are cheap and nutritious– and just so happen to use up those last few ingredients from their most recent grocery store trip.

Data Usage: Food.com Recipes and Interactions ¶

The following dataset summarizes the contents of food.com, including 180,000 recipes published on the site up until 2019, along with all reviews for each recipe. For the purposes of this project, the data we’re interested in belongs to three separate files, as outlined below:

PP_recipes.csv provides data on each individual recipe, including a list of ingredients.¶

Attributes that would be used for this project:

Id: the identification code of the recipe, as it appears on food.com
Ingredient_ids: list of identification codes which correspond to unique ingredients used in the recipe

In [1]:

import pandas as pd

recipe_data = pd.read_csv('PP_recipes.csv')
recipe_data.head()

Out[1]:

	id	i	name_tokens	ingredient_tokens	steps_tokens	techniques	calorie_level	ingredient_ids
0	424415	23	[40480, 37229, 2911, 1019, 249, 6878, 6878, 28...	[[2911, 1019, 249, 6878], [1353], [6953], [153...	[40480, 40482, 21662, 481, 6878, 500, 246, 161...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...	0	[389, 7655, 6270, 1527, 3406]
1	146223	96900	[40480, 18376, 7056, 246, 1531, 2032, 40481]	[[17918], [25916], [2507, 6444], [8467, 1179],...	[40480, 40482, 729, 2525, 10906, 485, 43, 8393...	[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...	0	[2683, 4969, 800, 5298, 840, 2499, 6632, 7022,...
2	312329	120056	[40480, 21044, 16954, 8294, 556, 10837, 40481]	[[5867, 24176], [1353], [6953], [1301, 11332],...	[40480, 40482, 8240, 481, 24176, 296, 1353, 66...	[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...	1	[1257, 7655, 6270, 590, 5024, 1119, 4883, 6696...
3	74301	168258	[40480, 10025, 31156, 40481]	[[1270, 1645, 28447], [21601], [27952, 29471, ...	[40480, 40482, 5539, 21601, 1073, 903, 2324, 4...	[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	0	[7940, 3609, 7060, 6265, 1170, 6654, 5003, 3561]
4	76272	109030	[40480, 17841, 252, 782, 2373, 1641, 2373, 252...	[[1430, 11434], [1430, 17027], [1615, 23, 695,...	[40480, 40482, 14046, 1430, 11434, 488, 17027,...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...	0	[3484, 6324, 7594, 243]

ingr_map.pkl provides a guide to the IDs for each ingredient.¶

Data can be cleaned up to generate a list of unique IDs and the ingredients they correspond to.

Attributes that would be used for this project:

Ingredient: common name of kitchen ingredient
Ingr_id: unique identification code that corresponds to the ingredient listed, codes used to access dataset above

In [2]:

import pickle
import numpy as np

# Read in data
with open('ingr_map.pkl', 'rb') as f:
    ingr_data = pickle.load(f)

ingr_data.head()

Out[2]:

	raw_ingr	raw_words	processed	len_proc	replaced	count	id
0	medium heads bibb or red leaf lettuce, washed,...	13	medium heads bibb or red leaf lettuce, washed,...	73	lettuce	4507	4308
1	mixed baby lettuces and spring greens	6	mixed baby lettuces and spring green	36	lettuce	4507	4308
2	romaine lettuce leaf	3	romaine lettuce leaf	20	lettuce	4507	4308
3	iceberg lettuce leaf	3	iceberg lettuce leaf	20	lettuce	4507	4308
4	red romaine lettuce	3	red romaine lettuce	19	lettuce	4507	4308

In [3]:

# Create a series of unique ingredient names
ingredients_series = pd.Series(ingr_data['replaced'])
all_ingredients = ingredients_series.unique()

# Take a look at the data:
all_ingredients[100:105]

Out[3]:

array(['kosher salt & ground black pepper', 'cream of broccoli soup',
       'lemon frosting', 'roasted red peppers packed in oil',
       'ranch dips mix'], dtype=object)

In [7]:

# Combine with original IDs:
id_series = pd.Series(ingr_data['id'])
all_ids = id_series.unique()

ingredient_id_dict = {'Ingr_ID': all_ids,'Ingredient': all_ingredients }

# Convert to dataframe
ingredient_ids = pd.DataFrame(ingredient_id_dict)

# Take a look at the data!
ingredient_ids[905:910]

Out[7]:

	Ingr_ID	Ingredient
905	5694	powdered soy protein concentrate
906	299	bacon bit
907	5412	pineapple chunks in juice
908	2272	dried great northern bean
909	1982	crushed pineapple

Current gaps in available data:¶

Despite much searching, I wasn’t able to find a dataset with a comprehensive list of unit prices for common ingredients, such as produce, grains, or condiments. If we can track this down, this opens up a whole new door to analysis between the cost per meal and its nutritional value!

RAW_recipes.csv provides data on the nutritional value of each recipe.

Here’s a look at the data we’d be interested in– the name of the recipe, its ID code on food.com, and its nutritional value.These are listed as a percentage of recommended daily values.

Attributes that would be used for this project:

Name: name of recipe as it appears on food.com
Id: the identification code of the recipe on the site
Nutrition: a list of the nutritional values for a serving of that recipe
- Structure: [calories (#), total fat (PDV), sugar (PDV) , sodium (PDV) , protein (PDV) , saturated fat]

In [5]:

# Read in and preview data
recipe_data = pd.read_csv('RAW_recipes.csv')
recipe_data.head(2)

Out[5]:

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
0	arriba baked winter squash mexican style	137739	55	47892	2005-09-16	['60-minutes-or-less', 'time-to-make', 'course...	[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]	11	['make a choice and proceed with recipe', 'dep...	autumn is my favorite time of year to cook! th...	['winter squash', 'mexican seasoning', 'mixed ...	7
1	a bit different breakfast pizza	31490	30	26278	2002-06-17	['30-minutes-or-less', 'time-to-make', 'course...	[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]	9	['preheat oven to 425 degrees f', 'press dough...	this recipe calls for the crust to be prebaked...	['prepared pizza crust', 'sausage patty', 'egg...	6

In [6]:

# Omit extraneous information
nutrition_data = recipe_data[['name', 'id', 'nutrition']]
nutrition_data.head()

Out[6]:

	name	id	nutrition
0	arriba baked winter squash mexican style	137739	[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]
1	a bit different breakfast pizza	31490	[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]
2	all in the kitchen chili	112140	[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]
3	alouette potatoes	59389	[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]
4	amish tomato ketchup for canning	44061	[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]

How can data be used to solve this problem?¶

We can create an interface in which users can input the names of ingredients which they are trying to use up, and can receive a list of recipes that use these ingredients together.
- Some machine learning methods may be needed to help with Natural Language Processing.
  - For example, if a user enters “shredded cheddar cheese”, how will this correspond with the “cheddar cheese” Ingredient ID, which is used to return the recipes?
- Finding creative uses for leftover ingredients will minimize food waste, and will reduce the need to grocery shop in order to cook a cohesive meal.
Furthering this idea, if unit cost data becomes available for ingredients, this can be paired with nutrition information to recommend the most cost-effective sources of each major nutritional macromolecule (carbohydrates, protein, and fats).
- Using machine learning methods, these top ingredients can be clustered and entered into the interface described above. This will return recipes that use these nutritionally dense foods together.
  - We could potentially use machine learning methods to cluster these recipes together by similar ingredients (aside from just the ones entered) to generate a list of groceries needed to cook these recipes.
    - Recipes that share nearly the same ingredients reduces the need to buy additional ingredients, which may not be used up before they go bad. This will save money and reduce waste!

Reducing Food Waste with Creative Recipes¶

Problem: Food Waste¶

Data Usage: Food.com Recipes and Interactions¶

PP_recipes.csv provides data on each individual recipe, including a list of ingredients.¶

ingr_map.pkl provides a guide to the IDs for each ingredient.¶

Current gaps in available data:¶

How can data be used to solve this problem?¶

Data Usage: Food.com Recipes and Interactions ¶