Much like a lot of other interactions, dating has now largely gone digital. Single people looking for love struggle with swiping through profile after profile with generic answers and pictures of men with mullets holding fish. And there's still no guarantee you'll find someone. A new phenomenon called "Dating App Burnout" has been plaguing people looking for a partner as they feel hopeless with their options but a pressure to keep swiping to find their soulmate hidden amongst the crowd.
This project uses OKCupid profiles to determine how compatible people are and give a rating of how likely a relationship is to work out. This algorithm can then be used on someone's individual input, someone's answers to the profile questions, and can match them with their most likely companion: like a digital matchmaker.
The matchmaking will be based on similarity between responses and the individual's priority of different categories. For example, if two people are very similar except one smokes and the other does not and it is very important to the individual that their partner not smoke, they will not be ranked as very compatible. Using the same logic, if the profiles are similar in an individual's most important categories but very different in the less important categories, their compatibility will be ranked higher.
import pandas as pd
df_profile = pd.read_csv('okcupid_profiles.csv')
df_profile.head()
age | status | sex | orientation | body_type | diet | drinks | drugs | education | ethnicity | ... | essay0 | essay1 | essay2 | essay3 | essay4 | essay5 | essay6 | essay7 | essay8 | essay9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 22 | single | m | straight | a little extra | strictly anything | socially | never | working on college/university | asian, white | ... | about me: i would love to think that i was so... | currently working as an international agent fo... | making people laugh. ranting about a good salt... | the way i look. i am a six foot half asian, ha... | books: absurdistan, the republic, of mice and ... | food. water. cell phone. shelter. | duality and humorous things | trying to find someone to hang out with. i am ... | i am new to california and looking for someone... | you want to be swept off your feet! you are ti... |
1 | 35 | single | m | straight | average | mostly other | often | sometimes | working on space camp | white | ... | i am a chef: this is what that means. 1. i am ... | dedicating everyday to being an unbelievable b... | being silly. having ridiculous amonts of fun w... | NaN | i am die hard christopher moore fan. i don't r... | delicious porkness in all of its glories. my b... | NaN | NaN | i am very open and will share just about anyth... | NaN |
2 | 38 | available | m | straight | thin | anything | socially | NaN | graduated from masters program | NaN | ... | i'm not ashamed of much, but writing public te... | i make nerdy software for musicians, artists, ... | improvising in different contexts. alternating... | my large jaw and large glasses are the physica... | okay this is where the cultural matrix gets so... | movement conversation creation contemplation t... | NaN | viewing. listening. dancing. talking. drinking... | when i was five years old, i was known as "the... | you are bright, open, intense, silly, ironic, ... |
3 | 23 | single | m | straight | thin | vegetarian | socially | NaN | working on college/university | white | ... | i work in a library and go to school. . . | reading things written by old dead people | playing synthesizers and organizing books acco... | socially awkward but i do my best | bataille, celine, beckett. . . lynch, jarmusch... | NaN | cats and german philosophy | NaN | NaN | you feel so inclined. |
4 | 29 | single | m | straight | athletic | NaN | socially | never | graduated from college/university | asian, black, other | ... | hey how's it going? currently vague on the pro... | work work work work + play | creating imagery to look at: http://bagsbrown.... | i smile a lot and my inquisitive nature | music: bands, rappers, musicians at the moment... | NaN | NaN | NaN | NaN | NaN |
5 rows × 31 columns
The each profile (row) in the dataset includes the person's age, sex, relationship status, sexual orientation, body type, some habits, and a few short answer questions.
Variable Name | Description |
---|---|
age | age (18-68) |
status | Relationship status (discrete) |
sex | Sex (m or f) |
orientation | Sexual orientation (discrete) |
body_type | Body type/build (discrete) |
diet | Diet (discrete) |
drinks | Whether person drinks alcohol or not (discrete) |
education | Level of education (continuous) |
ethnicity | Ethnicity (continuous) |
height | Height in centimeters (57-81) |
income | Annual income (0-150,000) |
job | Job title/industry (continuous) |
last_online | Last time online in the form of a date and time ex. 2012-06-28-20-30 (June 28th 2012 at 8:30) |
location | City and state of user location (continuous) |
offspring | Whether user has or wants kids in the future (discrete) |
pets | Whether user has pets, wants pets, or which pets they like (discrete) |
religion | What religion the person identifies with and how serious they are (discrete) |
sign | Astrological sign (discrete) |
smokes | Whether the user smokes or not (discrete) |
speaks | Languages the user speaks (discrete, select multiple) |
essay0 | About me (all essays continuous) |
essay1 | Current goals |
essay2 | I could beat you at... |
essay3 | What I get the most compliments on |
essay4 | The last book I read/Favorite book |
essay5 | I value... |
essay6 | What I worry about/am thinking about |
essay7 | What I like to do on a Friday night |
essay8 | The most private thing I'm willing to admit |
essay9 | What I'm really looking for |
This project will use clustering to find the most similar profiles, using weighted categories to represent priorities of the user, and will give a percentage score of how similar the profiles are.