Environmental Impact of Food Production¶

Milia Chamas¶

PART 1: Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

  • Food production is responsible for over 25% of carbon emissions released around the world (https://ourworldindata.org/food-ghg-emissions). To take this further, not all foods/ food production methods emit CO2 the same, as some are more/less impactful than others. Without proper work being done to minimize this, food production alone would increase global temperatures past 1.5°C or 2°C in this century (https://ourworldindata.org/environmental-impacts-of-food?insight=food-emissions-climate-targets#key-insights-on-the-environmental-impacts-of-food), using up all, if not more, than our carbon budget.

  • With the use of data science, we're able to determine which foods have the most environmental impact, the environmental sector that it effects the most (ex. air, water, land), as well as which step in the food production process results in the most emissions. This way, we are able to pinpoint the main causes for the 25% of emissions and reduce them in the most effective way possible.


PART 2: Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

In [9]:
import pandas as pd

food_production = pd.read_csv('/Users/miliachamas/Library/Mobile Documents/com~apple~CloudDocs/Downloads/coding/ds2500/Food_Production.csv')
food_production.head()
Out[9]:
Food product Land use change Animal Feed Farm Processing Transport Packging Retail Total_emissions Eutrophying emissions per 1000kcal (gPO₄eq per 1000kcal) ... Freshwater withdrawals per 100g protein (liters per 100g protein) Freshwater withdrawals per kilogram (liters per kilogram) Greenhouse gas emissions per 1000kcal (kgCO₂eq per 1000kcal) Greenhouse gas emissions per 100g protein (kgCO₂eq per 100g protein) Land use per 1000kcal (m² per 1000kcal) Land use per kilogram (m² per kilogram) Land use per 100g protein (m² per 100g protein) Scarcity-weighted water use per kilogram (liters per kilogram) Scarcity-weighted water use per 100g protein (liters per 100g protein) Scarcity-weighted water use per 1000kcal (liters per 1000 kilocalories)
0 Wheat & Rye (Bread) 0.1 0.0 0.8 0.2 0.1 0.1 0.1 1.4 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Maize (Meal) 0.3 0.0 0.5 0.1 0.1 0.1 0.0 1.1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Barley (Beer) 0.0 0.0 0.2 0.1 0.0 0.5 0.3 1.1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Oatmeal 0.0 0.0 1.4 0.0 0.1 0.1 0.0 1.6 4.281357 ... 371.076923 482.4 0.945482 1.907692 2.897446 7.6 5.846154 18786.2 14450.92308 7162.104461
4 Rice 0.0 0.0 3.6 0.1 0.1 0.1 0.1 4.0 9.514379 ... 3166.760563 2248.4 1.207271 6.267606 0.759631 2.8 3.943662 49576.3 69825.77465 13449.891480

5 rows × 23 columns

Dataset source: https://www.kaggle.com/datasets/selfvivek/environment-impact-of-food-production

This dataset shows each aspect of the food production process for a variety of different types of foods. It's information is sufficient for the real-world problem I'm focusing on because it includes an extensive analysis on each food, includes a broad range of many food types (grains, meat, oils, fruits, etc.), and breaks down each process and environmental impact very well, making it easier to pick out foods with higher emissions than others and find specific issues to address.

Data Dictionary Note: I do not plan on using data from all columns of the dataset, so I've excluded some of them here All data measured in Kg CO2 - equivalents per kg product

  • Land use change: GHG emissions caused my humans altering the natural landscape, such as deforestation, land for grazing animals, and farming
  • Animal Feed: Emissions resulting from production of animal feed required for certain foods
  • Farm: Emissions of farms specific to a certain food
  • Processing: Emissions of processing each food
  • Transport: Emissions from transportation of food to where it will be packaged and sold
  • Packaging: Emissions released by the amount of packing used for each food
  • Retail: Emissions spent to sell food
  • Total_emissions: Total emissions of all aspects above
  • Eutrophying emissions per kilogram: runoff of nutrients into surrounding environment
  • Freshwater withdrawls per kilogram: Extracting freshwatrr from the ground or surface water for use

Identify weaknesses

  • Some weaknesses that are worth noting in this project are the fact that this dataset was made 3 years ago, so some numbers may be outdated as they've changed. Another point to note is that the dataset includes 43 food types, and although this is extensive, it probably does not account for all food in the world, but rather the majority. Lastly, this dataset is worldwide, meaning we aren't able to determine who is responsible for much of the production inefficiencies we see. We can assume that they come from the developed countries, but exactly which country is hard to determine as there is no specification that is broken down in this dataset.

PART 3: Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.

3-4 sentences

  • Evaluating which sector of the food production process emits the most will help us determine where the most time and effort should be put in reducing emissions, as well as which production strategies are more/less sustainable. Some possible machine learning methods I could use is regression to predict the future projections of impacts as well as clustering to group meaningful attributes together to find common themes between different foods or production methods.