PART 1: Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
Food production is responsible for over 25% of carbon emissions released around the world (https://ourworldindata.org/food-ghg-emissions). To take this further, not all foods/ food production methods emit CO2 the same, as some are more/less impactful than others. Without proper work being done to minimize this, food production alone would increase global temperatures past 1.5°C or 2°C in this century (https://ourworldindata.org/environmental-impacts-of-food?insight=food-emissions-climate-targets#key-insights-on-the-environmental-impacts-of-food), using up all, if not more, than our carbon budget.
With the use of data science, we're able to determine which foods have the most environmental impact, the environmental sector that it effects the most (ex. air, water, land), as well as which step in the food production process results in the most emissions. This way, we are able to pinpoint the main causes for the 25% of emissions and reduce them in the most effective way possible.
PART 2: Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
import pandas as pd
food_production = pd.read_csv('/Users/miliachamas/Library/Mobile Documents/com~apple~CloudDocs/Downloads/coding/ds2500/Food_Production.csv')
food_production.head()
Food product | Land use change | Animal Feed | Farm | Processing | Transport | Packging | Retail | Total_emissions | Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal) | ... | Freshwater withdrawals per 100g protein (liters per 100g protein) | Freshwater withdrawals per kilogram (liters per kilogram) | Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal) | Greenhouse gas emissions per 100g protein (kgCO₂eq per 100g protein) | Land use per 1000kcal (m² per 1000kcal) | Land use per kilogram (m² per kilogram) | Land use per 100g protein (m² per 100g protein) | Scarcity-weighted water use per kilogram (liters per kilogram) | Scarcity-weighted water use per 100g protein (liters per 100g protein) | Scarcity-weighted water use per 1000kcal (liters per 1000 kilocalories) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Wheat & Rye (Bread) | 0.1 | 0.0 | 0.8 | 0.2 | 0.1 | 0.1 | 0.1 | 1.4 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | Maize (Meal) | 0.3 | 0.0 | 0.5 | 0.1 | 0.1 | 0.1 | 0.0 | 1.1 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | Barley (Beer) | 0.0 | 0.0 | 0.2 | 0.1 | 0.0 | 0.5 | 0.3 | 1.1 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Oatmeal | 0.0 | 0.0 | 1.4 | 0.0 | 0.1 | 0.1 | 0.0 | 1.6 | 4.281357 | ... | 371.076923 | 482.4 | 0.945482 | 1.907692 | 2.897446 | 7.6 | 5.846154 | 18786.2 | 14450.92308 | 7162.104461 |
4 | Rice | 0.0 | 0.0 | 3.6 | 0.1 | 0.1 | 0.1 | 0.1 | 4.0 | 9.514379 | ... | 3166.760563 | 2248.4 | 1.207271 | 6.267606 | 0.759631 | 2.8 | 3.943662 | 49576.3 | 69825.77465 | 13449.891480 |
5 rows × 23 columns
Dataset source: https://www.kaggle.com/datasets/selfvivek/environment-impact-of-food-production
This dataset shows each aspect of the food production process for a variety of different types of foods. It's information is sufficient for the real-world problem I'm focusing on because it includes an extensive analysis on each food, includes a broad range of many food types (grains, meat, oils, fruits, etc.), and breaks down each process and environmental impact very well, making it easier to pick out foods with higher emissions than others and find specific issues to address.
Data Dictionary Note: I do not plan on using data from all columns of the dataset, so I've excluded some of them here All data measured in Kg CO2 - equivalents per kg product
Identify weaknesses
PART 3: Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.
3-4 sentences