Digital Matchmaker¶

Much like a lot of other interactions, dating has now largely gone digital. Single people looking for love struggle with swiping through profile after profile with generic answers and pictures of men with mullets holding fish. And there's still no guarantee you'll find someone. A new phenomenon called "Dating App Burnout" has been plaguing people looking for a partner as they feel hopeless with their options but a pressure to keep swiping to find their soulmate hidden amongst the crowd.

This project uses OKCupid profiles to determine how compatible people are and give a rating of how likely a relationship is to work out. This algorithm can then be used on someone's individual input, someone's answers to the profile questions, and can match them with their most likely companion: like a digital matchmaker.

The matchmaking will be based on similarity between responses and the individual's priority of different categories. For example, if two people are very similar except one smokes and the other does not and it is very important to the individual that their partner not smoke, they will not be ranked as very compatible. Using the same logic, if the profiles are similar in an individual's most important categories but very different in the less important categories, their compatibility will be ranked higher.

In [1]:
import pandas as pd

df_profile = pd.read_csv('okcupid_profiles.csv')
df_profile.head()
Out[1]:
age status sex orientation body_type diet drinks drugs education ethnicity ... essay0 essay1 essay2 essay3 essay4 essay5 essay6 essay7 essay8 essay9
0 22 single m straight a little extra strictly anything socially never working on college/university asian, white ... about me: i would love to think that i was so... currently working as an international agent fo... making people laugh. ranting about a good salt... the way i look. i am a six foot half asian, ha... books: absurdistan, the republic, of mice and ... food. water. cell phone. shelter. duality and humorous things trying to find someone to hang out with. i am ... i am new to california and looking for someone... you want to be swept off your feet! you are ti...
1 35 single m straight average mostly other often sometimes working on space camp white ... i am a chef: this is what that means. 1. i am ... dedicating everyday to being an unbelievable b... being silly. having ridiculous amonts of fun w... NaN i am die hard christopher moore fan. i don't r... delicious porkness in all of its glories. my b... NaN NaN i am very open and will share just about anyth... NaN
2 38 available m straight thin anything socially NaN graduated from masters program NaN ... i'm not ashamed of much, but writing public te... i make nerdy software for musicians, artists, ... improvising in different contexts. alternating... my large jaw and large glasses are the physica... okay this is where the cultural matrix gets so... movement conversation creation contemplation t... NaN viewing. listening. dancing. talking. drinking... when i was five years old, i was known as "the... you are bright, open, intense, silly, ironic, ...
3 23 single m straight thin vegetarian socially NaN working on college/university white ... i work in a library and go to school. . . reading things written by old dead people playing synthesizers and organizing books acco... socially awkward but i do my best bataille, celine, beckett. . . lynch, jarmusch... NaN cats and german philosophy NaN NaN you feel so inclined.
4 29 single m straight athletic NaN socially never graduated from college/university asian, black, other ... hey how's it going? currently vague on the pro... work work work work + play creating imagery to look at: http://bagsbrown.... i smile a lot and my inquisitive nature music: bands, rappers, musicians at the moment... NaN NaN NaN NaN NaN

5 rows × 31 columns

The each profile (row) in the dataset includes the person's age, sex, relationship status, sexual orientation, body type, some habits, and a few short answer questions.

Some of the short answer questions are:¶

  • Current goals
  • A perfect day would be...
  • Last show I binged...

Full Data Dictionary¶

Variable Name Description
age age (18-68)
status Relationship status (discrete)
sex Sex (m or f)
orientation Sexual orientation (discrete)
body_type Body type/build (discrete)
diet Diet (discrete)
drinks Whether person drinks alcohol or not (discrete)
education Level of education (continuous)
ethnicity Ethnicity (continuous)
height Height in centimeters (57-81)
income Annual income (0-150,000)
job Job title/industry (continuous)
last_online Last time online in the form of a date and time ex. 2012-06-28-20-30 (June 28th 2012 at 8:30)
location City and state of user location (continuous)
offspring Whether user has or wants kids in the future (discrete)
pets Whether user has pets, wants pets, or which pets they like (discrete)
religion What religion the person identifies with and how serious they are (discrete)
sign Astrological sign (discrete)
smokes Whether the user smokes or not (discrete)
speaks Languages the user speaks (discrete, select multiple)
essay0 About me (all essays continuous)
essay1 Current goals
essay2 I could beat you at...
essay3 What I get the most compliments on
essay4 The last book I read/Favorite book
essay5 I value...
essay6 What I worry about/am thinking about
essay7 What I like to do on a Friday night
essay8 The most private thing I'm willing to admit
essay9 What I'm really looking for

This project will use clustering to find the most similar profiles, using weighted categories to represent priorities of the user, and will give a percentage score of how similar the profiles are.