Data in Politics¶

In a politcal climate of ths U.S., where money and donations are large factors in campaigns and the outcome of who gets elected into office, I am curious to explore patterns and relationships found in these donation patterns. My motivation for this project comes from the 2020 presidental campaign in which Bernie Sander's boasted of his campaign's high number of donors and low average amount per donation (https://www.nytimes.com/interactive/2020/02/01/us/politics/democratic-presidential-campaign-donors.html). I am curious about what the most success campaigns donations look like, and shedding light on the more corrupt and unequal side of american democracy where it takes money to make a difference.

In [6]:
import pandas as pd

data = pd.read_csv('dime_recipients_all_1979_2014.csv')
/var/folders/wx/y2hbvnnj5t3f4ttwr3d3t2jh0000gn/T/ipykernel_30686/2315410551.py:3: DtypeWarning: Columns (0,4,5,11,12,13,14,15,16,17,18,21,22,36,54,55,56,57,58,59,62,64,66,67,68,69,70,73) have mixed types. Specify dtype option on import or set low_memory=False.
  data = pd.read_csv('dime_recipients_all_1979_2014.csv')

This dataset, found from this source (https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/O5PX0B/UQOVBZ&version=3.0) contains a list of all receipts of campaign donations from 1979 to 2014.

(If later on in the project the focus shifts, this dataset came from a file that contains a bunch of other datasets and one of them definently includes the specific data wanted.)

In [12]:
data.shape
Out[12]:
(379761, 74)

It conatins 379,761 entries.

This is what it looks like:

In [13]:
data.head()
Out[13]:
election cycle fecyear Cand.ID FEC.ID NID ICPSR ICPSR2 bonica.rid bonica.cid ... igcat comtype nimsp.party nimsp.candidate.ICO.code nimsp.district nimsp.office nimsp.candidate.status before.switch.ICPSR after.switch.ICPSR party.orig
0 fdfd1980 1980 1978.0 S8WY00072 C00101774 NS8WY00072 S8WY000721980 S8WY00072 cand107439 NaN ... NaN S NaN NaN NaN NaN NaN NaN NaN 100.0
1 fdfd1980 1980 1978.0 S8WY00064 C00104539 NS8WY00064 S8WY000641980 S8WY00064 cand107438 3.822461e+09 ... NaN S NaN NaN NaN NaN NaN NaN NaN 100.0
2 fdfd1980 1980 1978.0 S8WY00056 C00102673 NS8WY00056 S8WY000561980 S8WY00056 cand107437 NaN ... NaN S NaN NaN NaN NaN NaN NaN NaN 200.0
3 fdfd1980 1980 1978.0 S8WY00049 C00103044 NS8WY00049 S8WY000491980 S8WY00049 cand107436 NaN ... NaN S NaN NaN NaN NaN NaN NaN NaN 200.0
4 fdfd1980 1980 1978.0 S8WY00031 C00090373 NS8WY00031 S8WY000311980 S8WY00031 cand107435 NaN ... NaN S NaN NaN NaN NaN NaN NaN NaN 100.0

5 rows × 74 columns

In [14]:
#Column titles:
print(data.columns)
Index(['election', 'cycle', 'fecyear', 'Cand.ID', 'FEC.ID', 'NID', 'ICPSR',
       'ICPSR2', 'bonica.rid', 'bonica.cid', 'name', 'lname', 'ffname',
       'fname', 'mname', 'nname', 'title', 'suffix', 'party', 'state', 'seat',
       'district', 'Incum.Chall', 'recipient.cfscore', 'contributor.cfscore',
       'recipient.cfscore.dyn', 'dwnom1', 'dwnom2', 'ps.dwnom1', 'ps.dwnom2',
       'dwdime', 'irt.cfscore', 'num.givers', 'num.givers.total',
       'n.data.points.personal.donations',
       'n.data.points.personal.donations.unq', 'cand.gender',
       'total.disbursements', 'total.pc.contribs', 'contribs.from.candidate',
       'unitemized', 'non.party.ind.exp.for', 'non.party.ind.exp.against',
       'ind.exp.for', 'ind.exp.against', 'comm.cost.for', 'comm.cost.against',
       'party.coord.exp', 'party.ind.exp.against', 'total.receipts',
       'total.indiv.contrib', 'total.pac.contribs', 'ran.primary',
       'ran.general', 'p.elec.stat', 's.elec.stat', 'r.elec.stat',
       'gen.elec.stat', 'gen.elect.pct', 'winner', 'district.partisanship',
       'district.pres.vs', 'candStatus', 'recipient.type', 'igcat', 'comtype',
       'nimsp.party', 'nimsp.candidate.ICO.code', 'nimsp.district',
       'nimsp.office', 'nimsp.candidate.status', 'before.switch.ICPSR',
       'after.switch.ICPSR', 'party.orig'],
      dtype='object')

Data Dictionary:¶

as there are 74 features, I have highlighted the most interesting ones which ended up being most of them:

election: state code or fd for federal floowed by year
cycle: number that represents the two-year election cycle during which the contirbutions occured fecyear: year listed by the Federal Election Comission (FEC) as the campaign's target election. (notice this date can be later teh cylce in which donations were collected).
Cand.ID: FEC assigned candidate ID.
FEC.ID: FEC assigned ID to the candidates campaogn committee
NID: ID assigned to canidate by the Center for Responsive Politics
name: name of candidate receiveing donation
lname: last name of candiate
party: politcal party of candidate, 100 = Dem, 200= Rep, 328 = Ind
state: two letter state abbreviations
seat: office position campaigning for
district: distcict code
Incum.Chall: incumbency status, T = canidate is incumbent, C = challenger, O = open seat
recipient.cfscore: estimated ideology of canidate based on donations recevived
contributor.cfscore: estimated ideology of candidate besed on person donations they have geiven to other canidates.
num.givers: number of distinct donors that gave to the candidate during a specific election cycle.
num.givers.total: number of distinct donors that gave to the candidate over the condidate's entire career.
n.data.points.personal.donations: number of personal contributions records made by candidate.
n.data.points.personal.donations.unq: number of distinct recipients to whome the candidate personally donated total.disbursements: total campaign disbursements (the payment of money from a fund)(in dollars) for that election cycle
total.pc.contribs: totcal recipts from party committees
contribs.from.candidate: total recipts from candidate contributions.
unitemized: total unitemized receipts.
non.party.ind.exp.for: non-party independent expenditures made in support of the dandidate.
non.party.ind.exp.against: non-party independednt expenditures made against the candidate
ind.exp.for: total independent expenditures made to support the candidate
ind.exp.against: total independent expenditures made against the candidate
comm.cost.for: total communication costs made on behalf of the candidate.
comm.cost.against: total communication costs made to oopose the candidate.
party.coord.exp: total party coordinated expenditures.
party.ind.exp.against: total independent expenditures made by opposing party against the candidate total.receipts: total dollars raised by candidate during teh election cyclce
total.indiv.contrib: total individual receipys
total.pac.contribs: total PAC (Political Action Commitee) receipts
gen.elect.pct: vote share in general election.
winner: W = won election, L = lost election district.partisanship: Kernell's measure of district partionship for the current election cycle
recipient.type: cands = candidate, comm = committee igcat: FEC Interest group category code, C = Corporation, L = labor organization, M = Membership organization, T = Trade association, V = Cooperative, W = Corportation without capital stock.
comtype: FEC code for type of committee.

Data Citation:

Bonica, Adam, 2015, "Database on Ideology, Money in Politics, and Elections (DIME)", https://doi.org/10.7910/DVN/O5PX0B, Harvard Dataverse, V3

This dataset can be used to solved the problem mentioned in the begining. Interetsing endevears could include how the sizes of donations have changed over the years, and graphing the skew of individual candidate's donations. However the big project that uses ML would be clustering groups of candidates or donations together and seeing what this reveals about those groups, and the politics of those clusters.