food¶

Each individual student will submit a project proposal (3% of final grade) in .ipynb format which:

(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example: “We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.

Problem¶

As the world is evolving, less attention is paid to the way we eat food and keep our bodies healthy. According to Frontiers Nutrition, the percent of our diet being processed foods has increased from 5% in 1800 to 60% in 2019. The world is shifting to a place where all we care about is efficiency; whatever we can do quickest and easiest is the "way to go." With this mindset, however, we lose attention on keeping our bodies healthy, well-nourished, and balanced. Everything has shifted towards fast food, whether it's fast to make or fast to eat or cheap to produce.

Solution¶

To solve the problem above, we can use data of foods to create a well-balanced diet plan, covering all necessary vitamin requirements. With the data, not only can we choose what the best foods will be, we can create numerous possibilities to cover the requirement for a wide variety of people, even those who have allergies.

Dataset¶

Opened below is a dataset from the US Department of Agriculture's Food Composotion Database, with metrics on vitamins and minerals, as well as macronutrient percentages. Below is also a list of all the columns of the dataset, explaining the different metrics measured for each food. Each food is put into a generic category, while the 'Description' details the differences. For example, milk is a general category, while two subsections of milk would be whole milk and low sodium whole milk. This dataset will provide useful as it is very specific on the foods within categories.

https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/fndds-download-databases/

In [8]:

import pandas as pd

df_food = pd.read_csv('food.csv')
df_food

Out[8]:

	Category	Description	Nutrient Data Bank Number	Data.Alpha Carotene	Data.Beta Carotene	Data.Beta Cryptoxanthin	Data.Carbohydrate	Data.Cholesterol	Data.Choline	Data.Fiber	...	Data.Major Minerals.Phosphorus	Data.Major Minerals.Potassium	Data.Major Minerals.Sodium	Data.Major Minerals.Zinc	Data.Vitamins.Vitamin A - RAE	Data.Vitamins.Vitamin B12	Data.Vitamins.Vitamin B6	Data.Vitamins.Vitamin C	Data.Vitamins.Vitamin E	Data.Vitamins.Vitamin K
0	Milk	Milk, human	11000000	0	7	0	6.89	14	16.0	0.0	...	14	51	17	0.17	61	0.05	0.011	5.0	0.08	0.3
1	Milk	Milk, NFS	11100000	0	4	0	4.87	8	17.9	0.0	...	103	157	39	0.42	59	0.56	0.060	0.1	0.03	0.2
2	Milk	Milk, whole	11111000	0	7	0	4.67	12	17.8	0.0	...	101	150	38	0.41	32	0.54	0.061	0.0	0.05	0.3
3	Milk	Milk, low sodium, whole	11111100	0	7	0	4.46	14	16.0	0.0	...	86	253	3	0.38	29	0.36	0.034	0.9	0.08	0.3
4	Milk	Milk, calcium fortified, whole	11111150	0	7	0	4.67	12	17.8	0.0	...	101	150	38	0.41	32	0.54	0.061	0.0	0.05	0.3
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
7078	Tomatoes as ingredient in omelet	Tomatoes as ingredient in omelet	99997802	103	464	0	5.48	0	7.4	1.6	...	30	278	6	0.21	43	0.00	0.104	18.2	0.60	8.8
7079	Other vegetables as ingredient in omelet	Other vegetables as ingredient in omelet	99997804	1	11	0	4.81	0	19.4	1.4	...	96	364	6	0.58	1	0.04	0.123	6.3	0.03	0.4
7080	Vegetables as ingredient in curry	Vegetables as ingredient in curry	99997810	368	994	0	11.60	0	14.6	2.2	...	46	312	19	0.28	98	0.00	0.177	16.2	0.24	8.9
7081	Sauce as ingredient in hamburgers	Sauce as ingredient in hamburgers	99998130	0	194	4	17.14	13	20.0	0.6	...	33	190	845	0.21	21	0.04	0.104	2.5	1.90	50.8
7082	Industrial oil as ingredient in food	Industrial oil as ingredient in food	99998210	0	0	0	0.00	0	0.2	0.0	...	0	0	0	0.01	0	0.00	0.000	0.0	10.50	155.8

7083 rows × 38 columns

In [9]:

df_food.columns

Out[9]:

Index(['Category', 'Description', 'Nutrient Data Bank Number',
       'Data.Alpha Carotene', 'Data.Beta Carotene', 'Data.Beta Cryptoxanthin',
       'Data.Carbohydrate', 'Data.Cholesterol', 'Data.Choline', 'Data.Fiber',
       'Data.Lutein and Zeaxanthin', 'Data.Lycopene', 'Data.Niacin',
       'Data.Protein', 'Data.Retinol', 'Data.Riboflavin', 'Data.Selenium',
       'Data.Sugar Total', 'Data.Thiamin', 'Data.Water',
       'Data.Fat.Monosaturated Fat', 'Data.Fat.Polysaturated Fat',
       'Data.Fat.Saturated Fat', 'Data.Fat.Total Lipid',
       'Data.Major Minerals.Calcium', 'Data.Major Minerals.Copper',
       'Data.Major Minerals.Iron', 'Data.Major Minerals.Magnesium',
       'Data.Major Minerals.Phosphorus', 'Data.Major Minerals.Potassium',
       'Data.Major Minerals.Sodium', 'Data.Major Minerals.Zinc',
       'Data.Vitamins.Vitamin A - RAE', 'Data.Vitamins.Vitamin B12',
       'Data.Vitamins.Vitamin B6', 'Data.Vitamins.Vitamin C',
       'Data.Vitamins.Vitamin E', 'Data.Vitamins.Vitamin K'],
      dtype='object')

In [ ]: