Which States are Most Crucial to Winning the Presidency?¶

Motivation:¶

Problem:¶

Every four years, the two parties struggle to achieve the highest office in the country - the presidency. They hit the campaign trail, where they attempt to sway undecided voters. However, not every state is created equal. States like California and Texas have 54 and 40 electoral votes respectively, while states like North and South Dakota have a measly 3 electoral votes. You would think the candidates would visit states that have the most electoral votes; this is not the case. Rather than focus on states with the highest electoral vote count, candidates focus on states that typically vote for one candidate over another on a razor-thin margin. These are called swing states.

To see more about what states are viewed as critical by campaigns, here is a visualization of the different stops Joe Biden and Donald Trump made during the 2020 presidential campaign.

However, this all begs the question: is there one specific state that is a neccessity in winning a presidential election? Are there multiple? If there aren't any one state, has a certain pair always resulted in a winner?

Solution:¶

By analyzing the results of presidential elections from the past five decades, we can determine which states are most pivotal on the path to the White House by their importance to the victory. To be clear, the importance of a state is determined by how often it was a state that the winner had won. Even though California has around 3 times the electoral votes, Ohio is widely agreed upon as one of the most important swing states every election, while California is barely considered since it is so consistently blue.

Initially, we will primarily focus on dividing states based on their individual impact on the results. If a state has been won by the winner in every election, it will be classified as highly important, while a state that has been won by both the winner and loser equal amounts will be considered not as important.

Depending on the how easy that result is, we will then analyze combinations of states. Has a specific combination led to victory consistently? What pairings should a winner focus on to guarantee victory?

Finally, as an absolute stretch goal that may not be attainable, we will delve into the vote counts of each state. Is there a specific margin of victory that a winner can hit in each state that can all but guarantee victory? How this will be determined is currently outside the scope of the project, though it may be added depending on how quickly the previous two steps are determined.

Impact:¶

As citizens in this democracy, it is vital that we understand the importance of certain states in how our elections function. This analysis should give a greater insight into the political impact certain controversies can have based on demographics, and why politicians make the decisions they do.

One drawback to how we are carrying out this experiment is that it does not account for the fact that the political climate is constantly changing. Florida used to be a consistent swing state, yet it has become more red in recent elections. It is very possible that the results may not reflect the actual most important state for the winner to win. However, it will demonstrate the most important one historically.

Dataset¶

Detail¶

We will be using a Kaggle Dataset on Elections from 1976-2020 for our analysis. Here is the data imported:

In [1]:
import pandas as pd

pd.read_csv('1976-2020-president.csv')
Out[1]:
year state state_po state_fips state_cen state_ic office candidate party_detailed writein candidatevotes totalvotes version notes party_simplified
0 1976 ALABAMA AL 1 63 41 US PRESIDENT CARTER, JIMMY DEMOCRAT False 659170 1182850 20210113 NaN DEMOCRAT
1 1976 ALABAMA AL 1 63 41 US PRESIDENT FORD, GERALD REPUBLICAN False 504070 1182850 20210113 NaN REPUBLICAN
2 1976 ALABAMA AL 1 63 41 US PRESIDENT MADDOX, LESTER AMERICAN INDEPENDENT PARTY False 9198 1182850 20210113 NaN OTHER
3 1976 ALABAMA AL 1 63 41 US PRESIDENT BUBAR, BENJAMIN ""BEN"" PROHIBITION False 6669 1182850 20210113 NaN OTHER
4 1976 ALABAMA AL 1 63 41 US PRESIDENT HALL, GUS COMMUNIST PARTY USE False 1954 1182850 20210113 NaN OTHER
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4282 2020 WYOMING WY 56 83 68 US PRESIDENT JORGENSEN, JO LIBERTARIAN False 5768 278503 20210113 NaN LIBERTARIAN
4283 2020 WYOMING WY 56 83 68 US PRESIDENT PIERCE, BROCK INDEPENDENT False 2208 278503 20210113 NaN OTHER
4284 2020 WYOMING WY 56 83 68 US PRESIDENT NaN NaN True 1739 278503 20210113 NaN OTHER
4285 2020 WYOMING WY 56 83 68 US PRESIDENT OVERVOTES NaN False 279 278503 20210113 NaN OTHER
4286 2020 WYOMING WY 56 83 68 US PRESIDENT UNDERVOTES NaN False 1459 278503 20210113 NaN OTHER

4287 rows × 15 columns

The different features in this csv are:

-year: the year of the presidential election.

-state: which state the results are for.

-state_po: an abbreviate for which state it is (New York -> NY).

-state-fips: The Federal Information Processing Standard state code. Redundant attribute for identifying state.

-state_cen: unclear meaning. Likely state identifier.

-state_ic: unclear meaning. Likely another state identifier.

-office: which office the candidate ran for. this value will always be "US PRESIDENT."

-candidate: name of the candidate who ran. this will likely be unnecessary for the scope of this analysis.

-party_detailed: which party the candidate belonged to. vital for our analysis. scope of this analysis is limited to "Democrat" or "Republican," so very likely other parties will be excluded.

-writein: whether the candidate was a write-in candidate.

-candidatevotes: how many votes the candidate received.

-totalvotes: total votes cast in the state that year.

-version: when the info was last updated.

-notes: additional info for the candidate - none is provided, so this will be excluded.

-party_simplified: narrows the number of parties listed (PROHIBITION in "party_detailed" is OTHER here).

The primary attributes we will be focusing on are year, state, candidatevotes, and party_simplified. We will use this info to analyze which candidates won each state, and we will also compare this information with the winners of each election year (information that is surprisingly not in the dataset). We will then see which states were won by each winner, and divide the states into varying degrees of importance.

Method:¶

We will classify states by their importance in leading their candidate to victory. Doing so will reveal how much more important certain states are to determining the winner than others.