Each individual student will submit a project proposal (3% of final grade) in .ipynb format which:
(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example: “We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.
As the world is evolving, less attention is paid to the way we eat food and keep our bodies healthy. According to Frontiers Nutrition, the percent of our diet being processed foods has increased from 5% in 1800 to 60% in 2019. The world is shifting to a place where all we care about is efficiency; whatever we can do quickest and easiest is the "way to go." With this mindset, however, we lose attention on keeping our bodies healthy, well-nourished, and balanced. Everything has shifted towards fast food, whether it's fast to make or fast to eat or cheap to produce.
To solve the problem above, we can use data of foods to create a well-balanced diet plan, covering all necessary vitamin requirements. With the data, not only can we choose what the best foods will be, we can create numerous possibilities to cover the requirement for a wide variety of people, even those who have allergies.
Opened below is a dataset from the US Department of Agriculture's Food Composotion Database, with metrics on vitamins and minerals, as well as macronutrient percentages. Below is also a list of all the columns of the dataset, explaining the different metrics measured for each food. Each food is put into a generic category, while the 'Description' details the differences. For example, milk is a general category, while two subsections of milk would be whole milk and low sodium whole milk. This dataset will provide useful as it is very specific on the foods within categories.
import pandas as pd
df_food = pd.read_csv('food.csv')
df_food
Category | Description | Nutrient Data Bank Number | Data.Alpha Carotene | Data.Beta Carotene | Data.Beta Cryptoxanthin | Data.Carbohydrate | Data.Cholesterol | Data.Choline | Data.Fiber | ... | Data.Major Minerals.Phosphorus | Data.Major Minerals.Potassium | Data.Major Minerals.Sodium | Data.Major Minerals.Zinc | Data.Vitamins.Vitamin A - RAE | Data.Vitamins.Vitamin B12 | Data.Vitamins.Vitamin B6 | Data.Vitamins.Vitamin C | Data.Vitamins.Vitamin E | Data.Vitamins.Vitamin K | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Milk | Milk, human | 11000000 | 0 | 7 | 0 | 6.89 | 14 | 16.0 | 0.0 | ... | 14 | 51 | 17 | 0.17 | 61 | 0.05 | 0.011 | 5.0 | 0.08 | 0.3 |
1 | Milk | Milk, NFS | 11100000 | 0 | 4 | 0 | 4.87 | 8 | 17.9 | 0.0 | ... | 103 | 157 | 39 | 0.42 | 59 | 0.56 | 0.060 | 0.1 | 0.03 | 0.2 |
2 | Milk | Milk, whole | 11111000 | 0 | 7 | 0 | 4.67 | 12 | 17.8 | 0.0 | ... | 101 | 150 | 38 | 0.41 | 32 | 0.54 | 0.061 | 0.0 | 0.05 | 0.3 |
3 | Milk | Milk, low sodium, whole | 11111100 | 0 | 7 | 0 | 4.46 | 14 | 16.0 | 0.0 | ... | 86 | 253 | 3 | 0.38 | 29 | 0.36 | 0.034 | 0.9 | 0.08 | 0.3 |
4 | Milk | Milk, calcium fortified, whole | 11111150 | 0 | 7 | 0 | 4.67 | 12 | 17.8 | 0.0 | ... | 101 | 150 | 38 | 0.41 | 32 | 0.54 | 0.061 | 0.0 | 0.05 | 0.3 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7078 | Tomatoes as ingredient in omelet | Tomatoes as ingredient in omelet | 99997802 | 103 | 464 | 0 | 5.48 | 0 | 7.4 | 1.6 | ... | 30 | 278 | 6 | 0.21 | 43 | 0.00 | 0.104 | 18.2 | 0.60 | 8.8 |
7079 | Other vegetables as ingredient in omelet | Other vegetables as ingredient in omelet | 99997804 | 1 | 11 | 0 | 4.81 | 0 | 19.4 | 1.4 | ... | 96 | 364 | 6 | 0.58 | 1 | 0.04 | 0.123 | 6.3 | 0.03 | 0.4 |
7080 | Vegetables as ingredient in curry | Vegetables as ingredient in curry | 99997810 | 368 | 994 | 0 | 11.60 | 0 | 14.6 | 2.2 | ... | 46 | 312 | 19 | 0.28 | 98 | 0.00 | 0.177 | 16.2 | 0.24 | 8.9 |
7081 | Sauce as ingredient in hamburgers | Sauce as ingredient in hamburgers | 99998130 | 0 | 194 | 4 | 17.14 | 13 | 20.0 | 0.6 | ... | 33 | 190 | 845 | 0.21 | 21 | 0.04 | 0.104 | 2.5 | 1.90 | 50.8 |
7082 | Industrial oil as ingredient in food | Industrial oil as ingredient in food | 99998210 | 0 | 0 | 0 | 0.00 | 0 | 0.2 | 0.0 | ... | 0 | 0 | 0 | 0.01 | 0 | 0.00 | 0.000 | 0.0 | 10.50 | 155.8 |
7083 rows × 38 columns
df_food.columns
Index(['Category', 'Description', 'Nutrient Data Bank Number', 'Data.Alpha Carotene', 'Data.Beta Carotene', 'Data.Beta Cryptoxanthin', 'Data.Carbohydrate', 'Data.Cholesterol', 'Data.Choline', 'Data.Fiber', 'Data.Lutein and Zeaxanthin', 'Data.Lycopene', 'Data.Niacin', 'Data.Protein', 'Data.Retinol', 'Data.Riboflavin', 'Data.Selenium', 'Data.Sugar Total', 'Data.Thiamin', 'Data.Water', 'Data.Fat.Monosaturated Fat', 'Data.Fat.Polysaturated Fat', 'Data.Fat.Saturated Fat', 'Data.Fat.Total Lipid', 'Data.Major Minerals.Calcium', 'Data.Major Minerals.Copper', 'Data.Major Minerals.Iron', 'Data.Major Minerals.Magnesium', 'Data.Major Minerals.Phosphorus', 'Data.Major Minerals.Potassium', 'Data.Major Minerals.Sodium', 'Data.Major Minerals.Zinc', 'Data.Vitamins.Vitamin A - RAE', 'Data.Vitamins.Vitamin B12', 'Data.Vitamins.Vitamin B6', 'Data.Vitamins.Vitamin C', 'Data.Vitamins.Vitamin E', 'Data.Vitamins.Vitamin K'], dtype='object')