Which States are Most Crucial to Winning the Presidency?¶

Motivation:¶

Problem:¶

Every four years, the two parties struggle to achieve the highest office in the country - the presidency. They hit the campaign trail, where they attempt to sway undecided voters. However, not every state is created equal. States like California and Texas have 54 and 40 electoral votes respectively, while states like North and South Dakota have a measly 3 electoral votes. You would think the candidates would visit states that have the most electoral votes; this is not the case. Rather than focus on states with the highest electoral vote count, candidates focus on states that typically vote for one candidate over another on a razor-thin margin. These are called swing states.

To see more about what states are viewed as critical by campaigns, here is a visualization of the different stops Joe Biden and Donald Trump made during the 2020 presidential campaign.

However, this all begs the question: is there one specific state that is a neccessity in winning a presidential election? Are there multiple? If there aren't any one state, has a certain pair always resulted in a winner?

Solution:¶

By analyzing the results of presidential elections from the past five decades, we can determine which states are most pivotal on the path to the White House by their importance to the victory. To be clear, the importance of a state is determined by how often it was a state that the winner had won. Even though California has around 3 times the electoral votes, Ohio is widely agreed upon as one of the most important swing states every election, while California is barely considered since it is so consistently blue.

Initially, we will primarily focus on dividing states based on their individual impact on the results. If a state has been won by the winner in every election, it will be classified as highly important, while a state that has been won by both the winner and loser equal amounts will be considered not as important.

Depending on the how easy that result is, we will then analyze combinations of states. Has a specific combination led to victory consistently? What pairings should a winner focus on to guarantee victory?

Finally, as an absolute stretch goal that may not be attainable, we will delve into the vote counts of each state. Is there a specific margin of victory that a winner can hit in each state that can all but guarantee victory? How this will be determined is currently outside the scope of the project, though it may be added depending on how quickly the previous two steps are determined.

Impact:¶

As citizens in this democracy, it is vital that we understand the importance of certain states in how our elections function. This analysis should give a greater insight into the political impact certain controversies can have based on demographics, and why politicians make the decisions they do.

One drawback to how we are carrying out this experiment is that it does not account for the fact that the political climate is constantly changing. Florida used to be a consistent swing state, yet it has become more red in recent elections. It is very possible that the results may not reflect the actual most important state for the winner to win. However, it will demonstrate the most important one historically.

Dataset¶

Detail¶

We will be using a Kaggle Dataset on Elections from 1976-2020 for our analysis. Here is the data imported:

In [1]:

import pandas as pd

pd.read_csv('1976-2020-president.csv')

Out[1]:

	year	state	state_po	state_fips	state_cen	state_ic	office	candidate	party_detailed	writein	candidatevotes	totalvotes	version	notes	party_simplified
0	1976	ALABAMA	AL	1	63	41	US PRESIDENT	CARTER, JIMMY	DEMOCRAT	False	659170	1182850	20210113	NaN	DEMOCRAT
1	1976	ALABAMA	AL	1	63	41	US PRESIDENT	FORD, GERALD	REPUBLICAN	False	504070	1182850	20210113	NaN	REPUBLICAN
2	1976	ALABAMA	AL	1	63	41	US PRESIDENT	MADDOX, LESTER	AMERICAN INDEPENDENT PARTY	False	9198	1182850	20210113	NaN	OTHER
3	1976	ALABAMA	AL	1	63	41	US PRESIDENT	BUBAR, BENJAMIN ""BEN""	PROHIBITION	False	6669	1182850	20210113	NaN	OTHER
4	1976	ALABAMA	AL	1	63	41	US PRESIDENT	HALL, GUS	COMMUNIST PARTY USE	False	1954	1182850	20210113	NaN	OTHER
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4282	2020	WYOMING	WY	56	83	68	US PRESIDENT	JORGENSEN, JO	LIBERTARIAN	False	5768	278503	20210113	NaN	LIBERTARIAN
4283	2020	WYOMING	WY	56	83	68	US PRESIDENT	PIERCE, BROCK	INDEPENDENT	False	2208	278503	20210113	NaN	OTHER
4284	2020	WYOMING	WY	56	83	68	US PRESIDENT	NaN	NaN	True	1739	278503	20210113	NaN	OTHER
4285	2020	WYOMING	WY	56	83	68	US PRESIDENT	OVERVOTES	NaN	False	279	278503	20210113	NaN	OTHER
4286	2020	WYOMING	WY	56	83	68	US PRESIDENT	UNDERVOTES	NaN	False	1459	278503	20210113	NaN	OTHER

4287 rows × 15 columns

The different features in this csv are:

-year: the year of the presidential election.

-state: which state the results are for.

-state_po: an abbreviate for which state it is (New York -> NY).

-state-fips: The Federal Information Processing Standard state code. Redundant attribute for identifying state.

-state_cen: unclear meaning. Likely state identifier.

-state_ic: unclear meaning. Likely another state identifier.

-office: which office the candidate ran for. this value will always be "US PRESIDENT."

-candidate: name of the candidate who ran. this will likely be unnecessary for the scope of this analysis.

-party_detailed: which party the candidate belonged to. vital for our analysis. scope of this analysis is limited to "Democrat" or "Republican," so very likely other parties will be excluded.

-writein: whether the candidate was a write-in candidate.

-candidatevotes: how many votes the candidate received.

-totalvotes: total votes cast in the state that year.

-version: when the info was last updated.

-notes: additional info for the candidate - none is provided, so this will be excluded.

-party_simplified: narrows the number of parties listed (PROHIBITION in "party_detailed" is OTHER here).

The primary attributes we will be focusing on are year, state, candidatevotes, and party_simplified. We will use this info to analyze which candidates won each state, and we will also compare this information with the winners of each election year (information that is surprisingly not in the dataset). We will then see which states were won by each winner, and divide the states into varying degrees of importance.

Method:¶

We will classify states by their importance in leading their candidate to victory. Doing so will reveal how much more important certain states are to determining the winner than others.