Exploring California's Climate Change Using California GHG Emission Inventory Data¶

Motivation¶

The Real-World Problem¶

Climate change is a significant and prevalent issue in the United States. The nation has already experienced an abundance of climate effects such as wildfires, heatwaves, droughts, floods, and hurricanes. In addition to the more apparent consequences, climate change also poses a long-term threat towards the United States, and the rest of the world. This threat stems from the global warming that is taking place, in large part, due to the intense greenhouse gas (GHG) emissions of human activities like the burning of fossil fuels and deforestation. According to the Intergovernmental Panel on Climate Change, these activities are causing rapid and unprecedented changes to the climate system.

Among the states in this country, California has been at the forefront of climate action for several years. California has established a variety of policies to incentivize renewable energy, energy efficiency, and electric vehicles in an effort to reduce greenhouse gas emissions. Beyond California's role as the leading state in the fight against climate change, it also plays host to the majority of climate impacts (wildfires, heatwaves, droughts, and floods) that the globe is beginning to experience due to climate change, making it all the more important and urgent to reduce the emissions of greenhouse gases.

Solution¶

Through use of the current California GHG Emission Inventory dataset, data science and machine learning can provide helpful insight into the current state of climate change in California by describing the evolution of greenhouse gas emissions throughout different sectors of Califronia's economy. The goal of this project is to create collections of GHG emissions sorted by analyzing the varying levels of GHG emissions, the type of GHG emitted, and the sectors in which the emission of these gases are most prevalent.

Impact¶

If successful, this may allow us to gain an understanding about the areas in which California must focus their efforts in terms of GHG emission reduction policy. Furthermore, machine learning can serve as a tool to predict the expected levels of GHG emissions in the years to come, and therefore hint at the overall timeline that California is facing before the effects of climate change are irreversible. Of course, it is important to consider that California is only a single state within a nation within an entire globe, however, the data can yield essential data nonetheless.

Dataset Sources and Relevant Information¶

Dataset Sources:

  • Current California GHG Emission Inventory Data - California Air Resources Board
  • Learn more about Official State Greenhouse Gas Inventories - Environmental Protection Agency

Further Reading:

  • California's Fourth Climate Change Assessment - CA Gov
  • Intense Heat Wave in California - NPR
  • Climate Change Impacts in California - State of California Department of Justice
  • U.S. Greenhouse Gas Inventory Report: 1990 - 2014 - Environmental Protection Agency
  • What Climate Change Means for California - Environmental Protection Agency

Potential for Data Science in Fighting Climate Change:

  • New Generation of Data Scientists Could Be Best Weapon Against Climate Change - Fortune
  • UC Berkeley Center Will Apply Data Science to Solving Environmental Challenges - Berkeley News
  • Data Science Could Help Californians Battel Future Wildfires - The Conversation

Dataset¶

Detail¶

Type of Emission: Emission category describing its status within California's GHG Inventory (included emissions, excluded emissions, other emissions. All sectors in this particular dataset are included emissions.

IPCC Code: Designated code for this particular economic sector given by the Intergovernmental Panel on Climate Change.

Sector Level 1: Overall sector in which the emission took place.

Sector Level 2: Sub-sector in which the emission took place.

Sector Level 3: Further description of the sector in which the emission took place (if needed).

Sector Level 4: Further description of the sector in which the emission took place (if needed).

Activity Level 1: The activity that resulted in greenhouse gas emissions.

Activity Level 2: Further description of the activity that resulted in greenhouse gas emissions.

GHG: The Greenhouse Gas that was emitted.

GWP (100-yr AR4): The Global Warming Potential (GWP) described as the measure of how much a given amount of a greenhouse gas contributes to global warming over 100 years compared to the same amount of carbon dioxide (CO2). Therefore , CO2 is used as a reference gas with a GWP of 1.

2000 - 2020: The greenhouse gas emissions for the specified year measured in million tonnes (Tg) of CO2 equivalent - based on IPCC 4th Assessment 100-yr GWPs.

Sector Activity Code: Designated code for the activity within this particular economic sector that resulted in greehouse gas emissions.

In [3]:
# import pandas library
import pandas as pd

# import csv file
df_ghg_emissions = pd.read_csv('California_GHG_Inventory_By_Sector.csv')

# preview first 15 rows of data
df_ghg_emissions.head(15)
Out[3]:
Type of emission IPCC Code Sector Level 1 Sector Level 2 Sector Level 3 Sector Level 4 Activity Level 1 Activity Level 2 GHG GWP (100-yr AR4) ... 2012 2013 2014 2015 2016 2017 2018 2019 2020 SectorActivity_code
0 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Crop Production None Fuel combustion Natural gas CH4 25 ... 0.000213 0.000214 0.000216 0.000220 0.000229 0.000209 0.000225 0.000216 0.000224 60-01-10-99-01-020
1 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Crop Production None Fuel combustion Natural gas CO2 1 ... 0.452000 0.455000 0.458000 0.466000 0.485000 0.444000 0.477000 0.457000 0.475000 60-01-10-99-01-020
2 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Crop Production None Fuel combustion Natural gas N2O 298 ... 0.000254 0.000256 0.000258 0.000262 0.000273 0.000250 0.000268 0.000257 0.000267 60-01-10-99-01-020
3 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Livestock None Fuel combustion Natural gas CH4 25 ... 0.000034 0.000034 0.000035 0.000037 0.000036 0.000037 0.000037 0.000035 0.000042 60-01-27-99-01-020
4 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Livestock None Fuel combustion Natural gas CO2 1 ... 0.072100 0.071900 0.073400 0.078100 0.075600 0.077600 0.078900 0.075100 0.089200 60-01-27-99-01-020
5 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Livestock None Fuel combustion Natural gas N2O 298 ... 0.000041 0.000040 0.000041 0.000044 0.000043 0.000044 0.000044 0.000042 0.000050 60-01-27-99-01-020
6 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Biodiesel CH4 25 ... 0.000013 0.000040 0.000057 0.000114 0.000127 0.000099 0.000111 0.000098 0.000183 60-01-99-99-01-080
7 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Biodiesel N2O 298 ... 0.000032 0.000095 0.000137 0.000273 0.000304 0.000237 0.000266 0.000233 0.000436 60-01-99-99-01-080
8 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Distillate CH4 25 ... 0.000276 0.000277 0.000285 0.000304 0.000259 0.000195 0.000243 0.000184 0.000193 60-01-99-99-01-033
9 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Distillate CO2 1 ... 2.370000 2.380000 2.450000 2.610000 2.230000 1.670000 2.090000 1.580000 1.660000 60-01-99-99-01-033
10 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Distillate N2O 298 ... 0.001650 0.001650 0.001700 0.001810 0.001550 0.001160 0.001450 0.001090 0.001150 60-01-99-99-01-033
11 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Ethanol CH4 25 ... 0.000196 0.000149 0.000166 0.000021 0.000012 0.000014 0.000002 0.000003 0.000004 60-01-99-99-01-090
12 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Ethanol N2O 298 ... 0.001700 0.001290 0.001440 0.000185 0.000105 0.000123 0.000017 0.000025 0.000038 60-01-99-99-01-090
13 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Gasoline CH4 25 ... 0.000965 0.000699 0.000723 0.000100 0.000057 0.000067 0.000009 0.000014 0.000021 60-01-99-99-01-034
14 Included Emissions 1A4c Agriculture & Forestry Ag Energy Use Not Specified None Fuel combustion Gasoline CO2 1 ... 0.710000 0.515000 0.532000 0.073200 0.042200 0.049600 0.006740 0.010300 0.015600 60-01-99-99-01-034

15 rows × 32 columns

Potential Problems¶

Sector Level: The GHG emission data has varying levels of specificity when describing sector and activity. When sorting the data into separate collections, it will be important to pick a universal level of specificity for both sectors and activities.

Lack of Recent Data: The dataset used is the current California GHG Emission Inventory, however, it does not have any data beyond 2020. As a result, it excludes all of the developments (positive or negative) that took place over the past two years and might have an impact on the current status of climate change in California (ex: Covid-19 pandemic).

Context: This only has the potential to describe the evolution of climate change as a consequence of GHG emissions in the state of California. Especially due to California being one of the most progressive states in terms of fighting climate change, any estimations made from the data may offer a skewed view of reality if applied beyond the scope of California's climate change.

Method¶

I pose this problem as a clustering problem. Grouping different GHG emissions in California over the past two decades could reveal which specific sectors and activities need addressing in order to lower the annual human contributions to climate change.