Data in Politics¶

In a politcal climate of ths U.S., where money and donations are large factors in campaigns and the outcome of who gets elected into office, I am curious to explore patterns and relationships found in these donation patterns. My motivation for this project comes from the 2020 presidental campaign in which Bernie Sander's boasted of his campaign's high number of donors and low average amount per donation (https://www.nytimes.com/interactive/2020/02/01/us/politics/democratic-presidential-campaign-donors.html). I am curious about what the most success campaigns donations look like, and shedding light on the more corrupt and unequal side of american democracy where it takes money to make a difference.

In [6]:

import pandas as pd

data = pd.read_csv('dime_recipients_all_1979_2014.csv')

/var/folders/wx/y2hbvnnj5t3f4ttwr3d3t2jh0000gn/T/ipykernel_30686/2315410551.py:3: DtypeWarning: Columns (0,4,5,11,12,13,14,15,16,17,18,21,22,36,54,55,56,57,58,59,62,64,66,67,68,69,70,73) have mixed types. Specify dtype option on import or set low_memory=False.
  data = pd.read_csv('dime_recipients_all_1979_2014.csv')

This dataset, found from this source (https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/O5PX0B/UQOVBZ&version=3.0) contains a list of all receipts of campaign donations from 1979 to 2014.

(If later on in the project the focus shifts, this dataset came from a file that contains a bunch of other datasets and one of them definently includes the specific data wanted.)

In [12]:

data.shape

Out[12]:

(379761, 74)

It conatins 379,761 entries.

This is what it looks like:

In [13]:

data.head()

Out[13]:

	election	cycle	fecyear	Cand.ID	FEC.ID	NID	ICPSR	ICPSR2	bonica.rid	bonica.cid	...	igcat	comtype	nimsp.party	nimsp.candidate.ICO.code	nimsp.district	nimsp.office	nimsp.candidate.status	before.switch.ICPSR	after.switch.ICPSR	party.orig
0	fdfd1980	1980	1978.0	S8WY00072	C00101774	NS8WY00072	S8WY000721980	S8WY00072	cand107439	NaN	...	NaN	S	NaN	NaN	NaN	NaN	NaN	NaN	NaN	100.0
1	fdfd1980	1980	1978.0	S8WY00064	C00104539	NS8WY00064	S8WY000641980	S8WY00064	cand107438	3.822461e+09	...	NaN	S	NaN	NaN	NaN	NaN	NaN	NaN	NaN	100.0
2	fdfd1980	1980	1978.0	S8WY00056	C00102673	NS8WY00056	S8WY000561980	S8WY00056	cand107437	NaN	...	NaN	S	NaN	NaN	NaN	NaN	NaN	NaN	NaN	200.0
3	fdfd1980	1980	1978.0	S8WY00049	C00103044	NS8WY00049	S8WY000491980	S8WY00049	cand107436	NaN	...	NaN	S	NaN	NaN	NaN	NaN	NaN	NaN	NaN	200.0
4	fdfd1980	1980	1978.0	S8WY00031	C00090373	NS8WY00031	S8WY000311980	S8WY00031	cand107435	NaN	...	NaN	S	NaN	NaN	NaN	NaN	NaN	NaN	NaN	100.0

5 rows × 74 columns

In [14]:

#Column titles:
print(data.columns)

Index(['election', 'cycle', 'fecyear', 'Cand.ID', 'FEC.ID', 'NID', 'ICPSR',
       'ICPSR2', 'bonica.rid', 'bonica.cid', 'name', 'lname', 'ffname',
       'fname', 'mname', 'nname', 'title', 'suffix', 'party', 'state', 'seat',
       'district', 'Incum.Chall', 'recipient.cfscore', 'contributor.cfscore',
       'recipient.cfscore.dyn', 'dwnom1', 'dwnom2', 'ps.dwnom1', 'ps.dwnom2',
       'dwdime', 'irt.cfscore', 'num.givers', 'num.givers.total',
       'n.data.points.personal.donations',
       'n.data.points.personal.donations.unq', 'cand.gender',
       'total.disbursements', 'total.pc.contribs', 'contribs.from.candidate',
       'unitemized', 'non.party.ind.exp.for', 'non.party.ind.exp.against',
       'ind.exp.for', 'ind.exp.against', 'comm.cost.for', 'comm.cost.against',
       'party.coord.exp', 'party.ind.exp.against', 'total.receipts',
       'total.indiv.contrib', 'total.pac.contribs', 'ran.primary',
       'ran.general', 'p.elec.stat', 's.elec.stat', 'r.elec.stat',
       'gen.elec.stat', 'gen.elect.pct', 'winner', 'district.partisanship',
       'district.pres.vs', 'candStatus', 'recipient.type', 'igcat', 'comtype',
       'nimsp.party', 'nimsp.candidate.ICO.code', 'nimsp.district',
       'nimsp.office', 'nimsp.candidate.status', 'before.switch.ICPSR',
       'after.switch.ICPSR', 'party.orig'],
      dtype='object')

Data Dictionary:¶

as there are 74 features, I have highlighted the most interesting ones which ended up being most of them:

election: state code or fd for federal floowed by year
cycle: number that represents the two-year election cycle during which the contirbutions occured fecyear: year listed by the Federal Election Comission (FEC) as the campaign's target election. (notice this date can be later teh cylce in which donations were collected).
Cand.ID: FEC assigned candidate ID.
FEC.ID: FEC assigned ID to the candidates campaogn committee
NID: ID assigned to canidate by the Center for Responsive Politics
name: name of candidate receiveing donation
lname: last name of candiate
party: politcal party of candidate, 100 = Dem, 200= Rep, 328 = Ind
state: two letter state abbreviations
seat: office position campaigning for
district: distcict code
Incum.Chall: incumbency status, T = canidate is incumbent, C = challenger, O = open seat
recipient.cfscore: estimated ideology of canidate based on donations recevived
contributor.cfscore: estimated ideology of candidate besed on person donations they have geiven to other canidates.
num.givers: number of distinct donors that gave to the candidate during a specific election cycle.
num.givers.total: number of distinct donors that gave to the candidate over the condidate's entire career.
n.data.points.personal.donations: number of personal contributions records made by candidate.
n.data.points.personal.donations.unq: number of distinct recipients to whome the candidate personally donated total.disbursements: total campaign disbursements (the payment of money from a fund)(in dollars) for that election cycle
total.pc.contribs: totcal recipts from party committees
contribs.from.candidate: total recipts from candidate contributions.
unitemized: total unitemized receipts.
non.party.ind.exp.for: non-party independent expenditures made in support of the dandidate.
non.party.ind.exp.against: non-party independednt expenditures made against the candidate
ind.exp.for: total independent expenditures made to support the candidate
ind.exp.against: total independent expenditures made against the candidate
comm.cost.for: total communication costs made on behalf of the candidate.
comm.cost.against: total communication costs made to oopose the candidate.
party.coord.exp: total party coordinated expenditures.
party.ind.exp.against: total independent expenditures made by opposing party against the candidate total.receipts: total dollars raised by candidate during teh election cyclce
total.indiv.contrib: total individual receipys
total.pac.contribs: total PAC (Political Action Commitee) receipts
gen.elect.pct: vote share in general election.
winner: W = won election, L = lost election district.partisanship: Kernell's measure of district partionship for the current election cycle
recipient.type: cands = candidate, comm = committee igcat: FEC Interest group category code, C = Corporation, L = labor organization, M = Membership organization, T = Trade association, V = Cooperative, W = Corportation without capital stock.
comtype: FEC code for type of committee.

Data Citation:

Bonica, Adam, 2015, "Database on Ideology, Money in Politics, and Elections (DIME)", https://doi.org/10.7910/DVN/O5PX0B, Harvard Dataverse, V3

This dataset can be used to solved the problem mentioned in the begining. Interetsing endevears could include how the sizes of donations have changed over the years, and graphing the skew of individual candidate's donations. However the big project that uses ML would be clustering groups of candidates or donations together and seeing what this reveals about those groups, and the politics of those clusters.