Reducing Food Waste with Creative Recipes¶

Problem: Food Waste¶

Food waste is a massive problem in our country, with nearly 1/3 of all food by weight being thrown out– that’s nearly 20.3 tons of food waste per year! This discarded food matter ends up in landfills and incarceration facilities, where it generates greenhouse gas emissions as it decomposes and burns. Beyond this, spoiled food is also a cause of financial loss for many buyers, who may not have been able to use the items they bought during their shelf life . As an environmentalist as well as a college student on a tight budget, I propose that we find a way to reduce this issue for the benefit of not only the planet, but our bank accounts as well.

Food waste happens at many different levels, with much occurring prior to reaching the store shelves; problems during harvesting, manufacturing, processing, and transportation all contribute to the issue at large. However, as consumers, we can minimize our contribution to this problem by ensuring proper usage of purchased products once they reach our hands.

This can be accomplished by providing buyers with a tool to maximize their fridge inventory, with the goal of preparing food so that it generates less waste and is more cost-effective! This proposal will discuss methods for achieving this goal, including providing users with suggestions for creative meals that are cheap and nutritious– and just so happen to use up those last few ingredients from their most recent grocery store trip.

Data Usage: Food.com Recipes and Interactions¶

The following dataset summarizes the contents of food.com, including 180,000 recipes published on the site up until 2019, along with all reviews for each recipe. For the purposes of this project, the data we’re interested in belongs to three separate files, as outlined below:

PP_recipes.csv provides data on each individual recipe, including a list of ingredients.¶

Attributes that would be used for this project:

  • Id: the identification code of the recipe, as it appears on food.com
  • Ingredient_ids: list of identification codes which correspond to unique ingredients used in the recipe
In [1]:
import pandas as pd

recipe_data = pd.read_csv('PP_recipes.csv')
recipe_data.head()
Out[1]:
id i name_tokens ingredient_tokens steps_tokens techniques calorie_level ingredient_ids
0 424415 23 [40480, 37229, 2911, 1019, 249, 6878, 6878, 28... [[2911, 1019, 249, 6878], [1353], [6953], [153... [40480, 40482, 21662, 481, 6878, 500, 246, 161... [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ... 0 [389, 7655, 6270, 1527, 3406]
1 146223 96900 [40480, 18376, 7056, 246, 1531, 2032, 40481] [[17918], [25916], [2507, 6444], [8467, 1179],... [40480, 40482, 729, 2525, 10906, 485, 43, 8393... [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ... 0 [2683, 4969, 800, 5298, 840, 2499, 6632, 7022,...
2 312329 120056 [40480, 21044, 16954, 8294, 556, 10837, 40481] [[5867, 24176], [1353], [6953], [1301, 11332],... [40480, 40482, 8240, 481, 24176, 296, 1353, 66... [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ... 1 [1257, 7655, 6270, 590, 5024, 1119, 4883, 6696...
3 74301 168258 [40480, 10025, 31156, 40481] [[1270, 1645, 28447], [21601], [27952, 29471, ... [40480, 40482, 5539, 21601, 1073, 903, 2324, 4... [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0 [7940, 3609, 7060, 6265, 1170, 6654, 5003, 3561]
4 76272 109030 [40480, 17841, 252, 782, 2373, 1641, 2373, 252... [[1430, 11434], [1430, 17027], [1615, 23, 695,... [40480, 40482, 14046, 1430, 11434, 488, 17027,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ... 0 [3484, 6324, 7594, 243]
ingr_map.pkl provides a guide to the IDs for each ingredient.¶

Data can be cleaned up to generate a list of unique IDs and the ingredients they correspond to.

Attributes that would be used for this project:

  • Ingredient: common name of kitchen ingredient

  • Ingr_id: unique identification code that corresponds to the ingredient listed, codes used to access dataset above

In [2]:
import pickle
import numpy as np

# Read in data
with open('ingr_map.pkl', 'rb') as f:
    ingr_data = pickle.load(f)

ingr_data.head()
Out[2]:
raw_ingr raw_words processed len_proc replaced count id
0 medium heads bibb or red leaf lettuce, washed,... 13 medium heads bibb or red leaf lettuce, washed,... 73 lettuce 4507 4308
1 mixed baby lettuces and spring greens 6 mixed baby lettuces and spring green 36 lettuce 4507 4308
2 romaine lettuce leaf 3 romaine lettuce leaf 20 lettuce 4507 4308
3 iceberg lettuce leaf 3 iceberg lettuce leaf 20 lettuce 4507 4308
4 red romaine lettuce 3 red romaine lettuce 19 lettuce 4507 4308
In [3]:
# Create a series of unique ingredient names
ingredients_series = pd.Series(ingr_data['replaced'])
all_ingredients = ingredients_series.unique()

# Take a look at the data:
all_ingredients[100:105]
Out[3]:
array(['kosher salt & ground black pepper', 'cream of broccoli soup',
       'lemon frosting', 'roasted red peppers packed in oil',
       'ranch dips mix'], dtype=object)
In [7]:
# Combine with original IDs:
id_series = pd.Series(ingr_data['id'])
all_ids = id_series.unique()

ingredient_id_dict = {'Ingr_ID': all_ids,'Ingredient': all_ingredients }

# Convert to dataframe
ingredient_ids = pd.DataFrame(ingredient_id_dict)

# Take a look at the data!
ingredient_ids[905:910]
Out[7]:
Ingr_ID Ingredient
905 5694 powdered soy protein concentrate
906 299 bacon bit
907 5412 pineapple chunks in juice
908 2272 dried great northern bean
909 1982 crushed pineapple

Current gaps in available data:¶

Despite much searching, I wasn’t able to find a dataset with a comprehensive list of unit prices for common ingredients, such as produce, grains, or condiments. If we can track this down, this opens up a whole new door to analysis between the cost per meal and its nutritional value!

RAW_recipes.csv provides data on the nutritional value of each recipe.

Here’s a look at the data we’d be interested in– the name of the recipe, its ID code on food.com, and its nutritional value.These are listed as a percentage of recommended daily values.

Attributes that would be used for this project:

  • Name: name of recipe as it appears on food.com
  • Id: the identification code of the recipe on the site
  • Nutrition: a list of the nutritional values for a serving of that recipe
    • Structure: [calories (#), total fat (PDV), sugar (PDV) , sodium (PDV) , protein (PDV) , saturated fat]
In [5]:
# Read in and preview data
recipe_data = pd.read_csv('RAW_recipes.csv')
recipe_data.head(2)
Out[5]:
name id minutes contributor_id submitted tags nutrition n_steps steps description ingredients n_ingredients
0 arriba baked winter squash mexican style 137739 55 47892 2005-09-16 ['60-minutes-or-less', 'time-to-make', 'course... [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0] 11 ['make a choice and proceed with recipe', 'dep... autumn is my favorite time of year to cook! th... ['winter squash', 'mexican seasoning', 'mixed ... 7
1 a bit different breakfast pizza 31490 30 26278 2002-06-17 ['30-minutes-or-less', 'time-to-make', 'course... [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0] 9 ['preheat oven to 425 degrees f', 'press dough... this recipe calls for the crust to be prebaked... ['prepared pizza crust', 'sausage patty', 'egg... 6
In [6]:
# Omit extraneous information
nutrition_data = recipe_data[['name', 'id', 'nutrition']]
nutrition_data.head()
Out[6]:
name id nutrition
0 arriba baked winter squash mexican style 137739 [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]
1 a bit different breakfast pizza 31490 [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]
2 all in the kitchen chili 112140 [269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]
3 alouette potatoes 59389 [368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]
4 amish tomato ketchup for canning 44061 [352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]

How can data be used to solve this problem?¶

  • We can create an interface in which users can input the names of ingredients which they are trying to use up, and can receive a list of recipes that use these ingredients together.

    • Some machine learning methods may be needed to help with Natural Language Processing.
      • For example, if a user enters “shredded cheddar cheese”, how will this correspond with the “cheddar cheese” Ingredient ID, which is used to return the recipes?
    • Finding creative uses for leftover ingredients will minimize food waste, and will reduce the need to grocery shop in order to cook a cohesive meal.
  • Furthering this idea, if unit cost data becomes available for ingredients, this can be paired with nutrition information to recommend the most cost-effective sources of each major nutritional macromolecule (carbohydrates, protein, and fats).

    • Using machine learning methods, these top ingredients can be clustered and entered into the interface described above. This will return recipes that use these nutritionally dense foods together.
      • We could potentially use machine learning methods to cluster these recipes together by similar ingredients (aside from just the ones entered) to generate a list of groceries needed to cook these recipes.
        • Recipes that share nearly the same ingredients reduces the need to buy additional ingredients, which may not be used up before they go bad. This will save money and reduce waste!