Every four years, the two parties struggle to achieve the highest office in the country - the presidency. They hit the campaign trail, where they attempt to sway undecided voters. However, not every state is created equal. States like California and Texas have 54 and 40 electoral votes respectively, while states like North and South Dakota have a measly 3 electoral votes. You would think the candidates would visit states that have the most electoral votes; this is not the case. Rather than focus on states with the highest electoral vote count, candidates focus on states that typically vote for one candidate over another on a razor-thin margin. These are called swing states.
To see more about what states are viewed as critical by campaigns, here is a visualization of the different stops Joe Biden and Donald Trump made during the 2020 presidential campaign.
However, this all begs the question: is there one specific state that is a neccessity in winning a presidential election? Are there multiple? If there aren't any one state, has a certain pair always resulted in a winner?
By analyzing the results of presidential elections from the past five decades, we can determine which states are most pivotal on the path to the White House by their importance to the victory. To be clear, the importance of a state is determined by how often it was a state that the winner had won. Even though California has around 3 times the electoral votes, Ohio is widely agreed upon as one of the most important swing states every election, while California is barely considered since it is so consistently blue.
Initially, we will primarily focus on dividing states based on their individual impact on the results. If a state has been won by the winner in every election, it will be classified as highly important, while a state that has been won by both the winner and loser equal amounts will be considered not as important.
Depending on the how easy that result is, we will then analyze combinations of states. Has a specific combination led to victory consistently? What pairings should a winner focus on to guarantee victory?
Finally, as an absolute stretch goal that may not be attainable, we will delve into the vote counts of each state. Is there a specific margin of victory that a winner can hit in each state that can all but guarantee victory? How this will be determined is currently outside the scope of the project, though it may be added depending on how quickly the previous two steps are determined.
As citizens in this democracy, it is vital that we understand the importance of certain states in how our elections function. This analysis should give a greater insight into the political impact certain controversies can have based on demographics, and why politicians make the decisions they do.
One drawback to how we are carrying out this experiment is that it does not account for the fact that the political climate is constantly changing. Florida used to be a consistent swing state, yet it has become more red in recent elections. It is very possible that the results may not reflect the actual most important state for the winner to win. However, it will demonstrate the most important one historically.
We will be using a Kaggle Dataset on Elections from 1976-2020 for our analysis. Here is the data imported:
import pandas as pd
pd.read_csv('1976-2020-president.csv')
year | state | state_po | state_fips | state_cen | state_ic | office | candidate | party_detailed | writein | candidatevotes | totalvotes | version | notes | party_simplified | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1976 | ALABAMA | AL | 1 | 63 | 41 | US PRESIDENT | CARTER, JIMMY | DEMOCRAT | False | 659170 | 1182850 | 20210113 | NaN | DEMOCRAT |
1 | 1976 | ALABAMA | AL | 1 | 63 | 41 | US PRESIDENT | FORD, GERALD | REPUBLICAN | False | 504070 | 1182850 | 20210113 | NaN | REPUBLICAN |
2 | 1976 | ALABAMA | AL | 1 | 63 | 41 | US PRESIDENT | MADDOX, LESTER | AMERICAN INDEPENDENT PARTY | False | 9198 | 1182850 | 20210113 | NaN | OTHER |
3 | 1976 | ALABAMA | AL | 1 | 63 | 41 | US PRESIDENT | BUBAR, BENJAMIN ""BEN"" | PROHIBITION | False | 6669 | 1182850 | 20210113 | NaN | OTHER |
4 | 1976 | ALABAMA | AL | 1 | 63 | 41 | US PRESIDENT | HALL, GUS | COMMUNIST PARTY USE | False | 1954 | 1182850 | 20210113 | NaN | OTHER |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4282 | 2020 | WYOMING | WY | 56 | 83 | 68 | US PRESIDENT | JORGENSEN, JO | LIBERTARIAN | False | 5768 | 278503 | 20210113 | NaN | LIBERTARIAN |
4283 | 2020 | WYOMING | WY | 56 | 83 | 68 | US PRESIDENT | PIERCE, BROCK | INDEPENDENT | False | 2208 | 278503 | 20210113 | NaN | OTHER |
4284 | 2020 | WYOMING | WY | 56 | 83 | 68 | US PRESIDENT | NaN | NaN | True | 1739 | 278503 | 20210113 | NaN | OTHER |
4285 | 2020 | WYOMING | WY | 56 | 83 | 68 | US PRESIDENT | OVERVOTES | NaN | False | 279 | 278503 | 20210113 | NaN | OTHER |
4286 | 2020 | WYOMING | WY | 56 | 83 | 68 | US PRESIDENT | UNDERVOTES | NaN | False | 1459 | 278503 | 20210113 | NaN | OTHER |
4287 rows × 15 columns
The different features in this csv are:
-year: the year of the presidential election.
-state: which state the results are for.
-state_po: an abbreviate for which state it is (New York -> NY).
-state-fips: The Federal Information Processing Standard state code. Redundant attribute for identifying state.
-state_cen: unclear meaning. Likely state identifier.
-state_ic: unclear meaning. Likely another state identifier.
-office: which office the candidate ran for. this value will always be "US PRESIDENT."
-candidate: name of the candidate who ran. this will likely be unnecessary for the scope of this analysis.
-party_detailed: which party the candidate belonged to. vital for our analysis. scope of this analysis is limited to "Democrat" or "Republican," so very likely other parties will be excluded.
-writein: whether the candidate was a write-in candidate.
-candidatevotes: how many votes the candidate received.
-totalvotes: total votes cast in the state that year.
-version: when the info was last updated.
-notes: additional info for the candidate - none is provided, so this will be excluded.
-party_simplified: narrows the number of parties listed (PROHIBITION in "party_detailed" is OTHER here).
The primary attributes we will be focusing on are year, state, candidatevotes, and party_simplified. We will use this info to analyze which candidates won each state, and we will also compare this information with the winners of each election year (information that is surprisingly not in the dataset). We will then see which states were won by each winner, and divide the states into varying degrees of importance.
We will classify states by their importance in leading their candidate to victory. Doing so will reveal how much more important certain states are to determining the winner than others.