food¶

Each individual student will submit a project proposal (3% of final grade) in .ipynb format which:

(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example: “We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.

Problem¶

As the world is evolving, less attention is paid to the way we eat food and keep our bodies healthy. According to Frontiers Nutrition, the percent of our diet being processed foods has increased from 5% in 1800 to 60% in 2019. The world is shifting to a place where all we care about is efficiency; whatever we can do quickest and easiest is the "way to go." With this mindset, however, we lose attention on keeping our bodies healthy, well-nourished, and balanced. Everything has shifted towards fast food, whether it's fast to make or fast to eat or cheap to produce.

Solution¶

To solve the problem above, we can use data of foods to create a well-balanced diet plan, covering all necessary vitamin requirements. With the data, not only can we choose what the best foods will be, we can create numerous possibilities to cover the requirement for a wide variety of people, even those who have allergies.

Dataset¶

Opened below is a dataset from the US Department of Agriculture's Food Composotion Database, with metrics on vitamins and minerals, as well as macronutrient percentages. Below is also a list of all the columns of the dataset, explaining the different metrics measured for each food. Each food is put into a generic category, while the 'Description' details the differences. For example, milk is a general category, while two subsections of milk would be whole milk and low sodium whole milk. This dataset will provide useful as it is very specific on the foods within categories.

https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/fndds-download-databases/

In [8]:
import pandas as pd

df_food = pd.read_csv('food.csv')
df_food
Out[8]:
Category Description Nutrient Data Bank Number Data.Alpha Carotene Data.Beta Carotene Data.Beta Cryptoxanthin Data.Carbohydrate Data.Cholesterol Data.Choline Data.Fiber ... Data.Major Minerals.Phosphorus Data.Major Minerals.Potassium Data.Major Minerals.Sodium Data.Major Minerals.Zinc Data.Vitamins.Vitamin A - RAE Data.Vitamins.Vitamin B12 Data.Vitamins.Vitamin B6 Data.Vitamins.Vitamin C Data.Vitamins.Vitamin E Data.Vitamins.Vitamin K
0 Milk Milk, human 11000000 0 7 0 6.89 14 16.0 0.0 ... 14 51 17 0.17 61 0.05 0.011 5.0 0.08 0.3
1 Milk Milk, NFS 11100000 0 4 0 4.87 8 17.9 0.0 ... 103 157 39 0.42 59 0.56 0.060 0.1 0.03 0.2
2 Milk Milk, whole 11111000 0 7 0 4.67 12 17.8 0.0 ... 101 150 38 0.41 32 0.54 0.061 0.0 0.05 0.3
3 Milk Milk, low sodium, whole 11111100 0 7 0 4.46 14 16.0 0.0 ... 86 253 3 0.38 29 0.36 0.034 0.9 0.08 0.3
4 Milk Milk, calcium fortified, whole 11111150 0 7 0 4.67 12 17.8 0.0 ... 101 150 38 0.41 32 0.54 0.061 0.0 0.05 0.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7078 Tomatoes as ingredient in omelet Tomatoes as ingredient in omelet 99997802 103 464 0 5.48 0 7.4 1.6 ... 30 278 6 0.21 43 0.00 0.104 18.2 0.60 8.8
7079 Other vegetables as ingredient in omelet Other vegetables as ingredient in omelet 99997804 1 11 0 4.81 0 19.4 1.4 ... 96 364 6 0.58 1 0.04 0.123 6.3 0.03 0.4
7080 Vegetables as ingredient in curry Vegetables as ingredient in curry 99997810 368 994 0 11.60 0 14.6 2.2 ... 46 312 19 0.28 98 0.00 0.177 16.2 0.24 8.9
7081 Sauce as ingredient in hamburgers Sauce as ingredient in hamburgers 99998130 0 194 4 17.14 13 20.0 0.6 ... 33 190 845 0.21 21 0.04 0.104 2.5 1.90 50.8
7082 Industrial oil as ingredient in food Industrial oil as ingredient in food 99998210 0 0 0 0.00 0 0.2 0.0 ... 0 0 0 0.01 0 0.00 0.000 0.0 10.50 155.8

7083 rows × 38 columns

In [9]:
df_food.columns
Out[9]:
Index(['Category', 'Description', 'Nutrient Data Bank Number',
       'Data.Alpha Carotene', 'Data.Beta Carotene', 'Data.Beta Cryptoxanthin',
       'Data.Carbohydrate', 'Data.Cholesterol', 'Data.Choline', 'Data.Fiber',
       'Data.Lutein and Zeaxanthin', 'Data.Lycopene', 'Data.Niacin',
       'Data.Protein', 'Data.Retinol', 'Data.Riboflavin', 'Data.Selenium',
       'Data.Sugar Total', 'Data.Thiamin', 'Data.Water',
       'Data.Fat.Monosaturated Fat', 'Data.Fat.Polysaturated Fat',
       'Data.Fat.Saturated Fat', 'Data.Fat.Total Lipid',
       'Data.Major Minerals.Calcium', 'Data.Major Minerals.Copper',
       'Data.Major Minerals.Iron', 'Data.Major Minerals.Magnesium',
       'Data.Major Minerals.Phosphorus', 'Data.Major Minerals.Potassium',
       'Data.Major Minerals.Sodium', 'Data.Major Minerals.Zinc',
       'Data.Vitamins.Vitamin A - RAE', 'Data.Vitamins.Vitamin B12',
       'Data.Vitamins.Vitamin B6', 'Data.Vitamins.Vitamin C',
       'Data.Vitamins.Vitamin E', 'Data.Vitamins.Vitamin K'],
      dtype='object')
In [ ]: