Climate change is a significant and prevalent issue in the United States. The nation has already experienced an abundance of climate effects such as wildfires, heatwaves, droughts, floods, and hurricanes. In addition to the more apparent consequences, climate change also poses a long-term threat towards the United States, and the rest of the world. This threat stems from the global warming that is taking place, in large part, due to the intense greenhouse gas (GHG) emissions of human activities like the burning of fossil fuels and deforestation. According to the Intergovernmental Panel on Climate Change, these activities are causing rapid and unprecedented changes to the climate system.
Among the states in this country, California has been at the forefront of climate action for several years. California has established a variety of policies to incentivize renewable energy, energy efficiency, and electric vehicles in an effort to reduce greenhouse gas emissions. Beyond California's role as the leading state in the fight against climate change, it also plays host to the majority of climate impacts (wildfires, heatwaves, droughts, and floods) that the globe is beginning to experience due to climate change, making it all the more important and urgent to reduce the emissions of greenhouse gases.
Through use of the current California GHG Emission Inventory dataset, data science and machine learning can provide helpful insight into the current state of climate change in California by describing the evolution of greenhouse gas emissions throughout different sectors of Califronia's economy. The goal of this project is to create collections of GHG emissions sorted by analyzing the varying levels of GHG emissions, the type of GHG emitted, and the sectors in which the emission of these gases are most prevalent.
If successful, this may allow us to gain an understanding about the areas in which California must focus their efforts in terms of GHG emission reduction policy. Furthermore, machine learning can serve as a tool to predict the expected levels of GHG emissions in the years to come, and therefore hint at the overall timeline that California is facing before the effects of climate change are irreversible. Of course, it is important to consider that California is only a single state within a nation within an entire globe, however, the data can yield essential data nonetheless.
Dataset Sources:
Further Reading:
Potential for Data Science in Fighting Climate Change:
Type of Emission: Emission category describing its status within California's GHG Inventory (included emissions, excluded emissions, other emissions. All sectors in this particular dataset are included emissions.
IPCC Code: Designated code for this particular economic sector given by the Intergovernmental Panel on Climate Change.
Sector Level 1: Overall sector in which the emission took place.
Sector Level 2: Sub-sector in which the emission took place.
Sector Level 3: Further description of the sector in which the emission took place (if needed).
Sector Level 4: Further description of the sector in which the emission took place (if needed).
Activity Level 1: The activity that resulted in greenhouse gas emissions.
Activity Level 2: Further description of the activity that resulted in greenhouse gas emissions.
GHG: The Greenhouse Gas that was emitted.
GWP (100-yr AR4): The Global Warming Potential (GWP) described as the measure of how much a given amount of a greenhouse gas contributes to global warming over 100 years compared to the same amount of carbon dioxide (CO2). Therefore , CO2 is used as a reference gas with a GWP of 1.
2000 - 2020: The greenhouse gas emissions for the specified year measured in million tonnes (Tg) of CO2 equivalent - based on IPCC 4th Assessment 100-yr GWPs.
Sector Activity Code: Designated code for the activity within this particular economic sector that resulted in greehouse gas emissions.
# import pandas library
import pandas as pd
# import csv file
df_ghg_emissions = pd.read_csv('California_GHG_Inventory_By_Sector.csv')
# preview first 15 rows of data
df_ghg_emissions.head(15)
Type of emission | IPCC Code | Sector Level 1 | Sector Level 2 | Sector Level 3 | Sector Level 4 | Activity Level 1 | Activity Level 2 | GHG | GWP (100-yr AR4) | ... | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | SectorActivity_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Crop Production | None | Fuel combustion | Natural gas | CH4 | 25 | ... | 0.000213 | 0.000214 | 0.000216 | 0.000220 | 0.000229 | 0.000209 | 0.000225 | 0.000216 | 0.000224 | 60-01-10-99-01-020 |
1 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Crop Production | None | Fuel combustion | Natural gas | CO2 | 1 | ... | 0.452000 | 0.455000 | 0.458000 | 0.466000 | 0.485000 | 0.444000 | 0.477000 | 0.457000 | 0.475000 | 60-01-10-99-01-020 |
2 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Crop Production | None | Fuel combustion | Natural gas | N2O | 298 | ... | 0.000254 | 0.000256 | 0.000258 | 0.000262 | 0.000273 | 0.000250 | 0.000268 | 0.000257 | 0.000267 | 60-01-10-99-01-020 |
3 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Livestock | None | Fuel combustion | Natural gas | CH4 | 25 | ... | 0.000034 | 0.000034 | 0.000035 | 0.000037 | 0.000036 | 0.000037 | 0.000037 | 0.000035 | 0.000042 | 60-01-27-99-01-020 |
4 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Livestock | None | Fuel combustion | Natural gas | CO2 | 1 | ... | 0.072100 | 0.071900 | 0.073400 | 0.078100 | 0.075600 | 0.077600 | 0.078900 | 0.075100 | 0.089200 | 60-01-27-99-01-020 |
5 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Livestock | None | Fuel combustion | Natural gas | N2O | 298 | ... | 0.000041 | 0.000040 | 0.000041 | 0.000044 | 0.000043 | 0.000044 | 0.000044 | 0.000042 | 0.000050 | 60-01-27-99-01-020 |
6 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Biodiesel | CH4 | 25 | ... | 0.000013 | 0.000040 | 0.000057 | 0.000114 | 0.000127 | 0.000099 | 0.000111 | 0.000098 | 0.000183 | 60-01-99-99-01-080 |
7 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Biodiesel | N2O | 298 | ... | 0.000032 | 0.000095 | 0.000137 | 0.000273 | 0.000304 | 0.000237 | 0.000266 | 0.000233 | 0.000436 | 60-01-99-99-01-080 |
8 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Distillate | CH4 | 25 | ... | 0.000276 | 0.000277 | 0.000285 | 0.000304 | 0.000259 | 0.000195 | 0.000243 | 0.000184 | 0.000193 | 60-01-99-99-01-033 |
9 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Distillate | CO2 | 1 | ... | 2.370000 | 2.380000 | 2.450000 | 2.610000 | 2.230000 | 1.670000 | 2.090000 | 1.580000 | 1.660000 | 60-01-99-99-01-033 |
10 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Distillate | N2O | 298 | ... | 0.001650 | 0.001650 | 0.001700 | 0.001810 | 0.001550 | 0.001160 | 0.001450 | 0.001090 | 0.001150 | 60-01-99-99-01-033 |
11 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Ethanol | CH4 | 25 | ... | 0.000196 | 0.000149 | 0.000166 | 0.000021 | 0.000012 | 0.000014 | 0.000002 | 0.000003 | 0.000004 | 60-01-99-99-01-090 |
12 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Ethanol | N2O | 298 | ... | 0.001700 | 0.001290 | 0.001440 | 0.000185 | 0.000105 | 0.000123 | 0.000017 | 0.000025 | 0.000038 | 60-01-99-99-01-090 |
13 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Gasoline | CH4 | 25 | ... | 0.000965 | 0.000699 | 0.000723 | 0.000100 | 0.000057 | 0.000067 | 0.000009 | 0.000014 | 0.000021 | 60-01-99-99-01-034 |
14 | Included Emissions | 1A4c | Agriculture & Forestry | Ag Energy Use | Not Specified | None | Fuel combustion | Gasoline | CO2 | 1 | ... | 0.710000 | 0.515000 | 0.532000 | 0.073200 | 0.042200 | 0.049600 | 0.006740 | 0.010300 | 0.015600 | 60-01-99-99-01-034 |
15 rows × 32 columns
Sector Level: The GHG emission data has varying levels of specificity when describing sector and activity. When sorting the data into separate collections, it will be important to pick a universal level of specificity for both sectors and activities.
Lack of Recent Data: The dataset used is the current California GHG Emission Inventory, however, it does not have any data beyond 2020. As a result, it excludes all of the developments (positive or negative) that took place over the past two years and might have an impact on the current status of climate change in California (ex: Covid-19 pandemic).
Context: This only has the potential to describe the evolution of climate change as a consequence of GHG emissions in the state of California. Especially due to California being one of the most progressive states in terms of fighting climate change, any estimations made from the data may offer a skewed view of reality if applied beyond the scope of California's climate change.
I pose this problem as a clustering problem. Grouping different GHG emissions in California over the past two decades could reveal which specific sectors and activities need addressing in order to lower the annual human contributions to climate change.