Match and Attraction Prediction¶

Motivation:¶

Problem¶

Knowing what makes people romantically compatible in an everchanging culture and environement is difficult. Knowing yourself and what you want is similarly challenging. However, with aggregated data, we can draw broad conclusions of dating preference. Dating is hard. Let's make it easier.

Solution¶

This study took questionaire data and outcome of 4 minute speed dates. Before/after the date, people rated themselves and their date partner on myriad of qualities (Personality, Attraction, Interests, etc). This combined with date outcome and self assessed expectations provides a basis for correlational attraction. The goal of this project is to identify factors between individuals which predict attraction on an early date.

Impact¶

If succesful, we will be able to predict matches through features such as similarity, personality traits, demographics, etc. This will reveal broader truthes in addition to general tips for early dates. Online dating websites can also use these conclusions to create better matches.

Negative Outcome: We draw an incorrect conclusion which is misleading.

Dataset¶

Detail¶

We will use a Kaggle Dataset of Speed dating to observe the following features:

  • Self personality (assortment)
  • Other's personality (assortment)
  • Race
  • Religion
  • Interests (assortment)
  • Expectations of # matches
  • Differences between self and other for all above ^^^
has_null wave gender age age_o d_age d_d_age race race_o samerace importance_same_race importance_same_religion d_importance_same_race d_importance_same_religion field pref_o_attractive pref_o_sincere pref_o_intelligence pref_o_funny pref_o_ambitious pref_o_shared_interests d_pref_o_attractive d_pref_o_sincere d_pref_o_intelligence d_pref_o_funny d_pref_o_ambitious d_pref_o_shared_interests attractive_o sinsere_o intelligence_o funny_o ambitous_o shared_interests_o d_attractive_o d_sinsere_o d_intelligence_o d_funny_o d_ambitous_o d_shared_interests_o attractive_important sincere_important intellicence_important funny_important ambtition_important shared_interests_important d_attractive_important d_sincere_important d_intellicence_important d_funny_important d_ambtition_important d_shared_interests_important attractive sincere intelligence funny ambition d_attractive d_sincere d_intelligence d_funny d_ambition attractive_partner sincere_partner intelligence_partner funny_partner ambition_partner shared_interests_partner d_attractive_partner d_sincere_partner d_intelligence_partner d_funny_partner d_ambition_partner d_shared_interests_partner sports tvsports exercise dining museums art hiking gaming clubbing reading tv theater movies concerts music shopping yoga d_sports d_tvsports d_exercise d_dining d_museums d_art d_hiking d_gaming d_clubbing d_reading d_tv d_theater d_movies d_concerts d_music d_shopping d_yoga interests_correlate d_interests_correlate expected_happy_with_sd_people expected_num_interested_in_me expected_num_matches d_expected_happy_with_sd_people d_expected_num_interested_in_me d_expected_num_matches like guess_prob_liked d_like d_guess_prob_liked met decision decision_o match
b'' 1.0 b'female' 21.0 27.0 6.0 b'[4-6]' b'Asian/Pacific Islander/Asian-American' b'European/Caucasian-American' b'0' 2.0 4.0 b'[2-5]' b'[2-5]' b'Law' 35.0 20.0 20.0 20.0 0.0 5.0 b'[21-100]' b'[16-20]' b'[16-20]' b'[16-20]' b'[0-15]' b'[0-15]' 6.0 8.0 8.0 8.0 8.0 6.0 b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' 15.0 20.0 20.0 15.0 15.0 15.0 b'[0-15]' b'[16-20]' b'[16-20]' b'[0-15]' b'[0-15]' b'[0-15]' 6.0 8.0 8.0 8.0 7.0 b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' 6.0 9.0 7.0 7.0 6.0 5.0 b'[6-8]' b'[9-10]' b'[6-8]' b'[6-8]' b'[6-8]' b'[0-5]' 9.0 2.0 8.0 9.0 1.0 1.0 5.0 1.0 5.0 6.0 9.0 1.0 10.0 10.0 9.0 8.0 1.0 b'[9-10]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[9-10]' b'[9-10]' b'[9-10]' b'[6-8]' b'[0-5]' 0.14 b'[0-0.33]' 3.0 2.0 4.0 b'[0-4]' b'[0-3]' b'[3-5]' 7.0 6.0 b'[6-8]' b'[5-6]' 0.0 b'1' b'0' b'0'
b'' 1.0 b'female' 21.0 22.0 1.0 b'[0-1]' b'Asian/Pacific Islander/Asian-American' b'European/Caucasian-American' b'0' 2.0 4.0 b'[2-5]' b'[2-5]' b'Law' 60.0 0.0 0.0 40.0 0.0 0.0 b'[21-100]' b'[0-15]' b'[0-15]' b'[21-100]' b'[0-15]' b'[0-15]' 7.0 8.0 10.0 7.0 7.0 5.0 b'[6-8]' b'[6-8]' b'[9-10]' b'[6-8]' b'[6-8]' b'[0-5]' 15.0 20.0 20.0 15.0 15.0 15.0 b'[0-15]' b'[16-20]' b'[16-20]' b'[0-15]' b'[0-15]' b'[0-15]' 6.0 8.0 8.0 8.0 7.0 b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' 7.0 8.0 7.0 8.0 5.0 6.0 b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[0-5]' b'[6-8]' 9.0 2.0 8.0 9.0 1.0 1.0 5.0 1.0 5.0 6.0 9.0 1.0 10.0 10.0 9.0 8.0 1.0 b'[9-10]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[9-10]' b'[9-10]' b'[9-10]' b'[6-8]' b'[0-5]' 0.54 b'[0.33-1]' 3.0 2.0 4.0 b'[0-4]' b'[0-3]' b'[3-5]' 7.0 5.0 b'[6-8]' b'[5-6]' 1.0 b'1' b'0' b'0'
b'' 1.0 b'female' 21.0 22.0 1.0 b'[0-1]' b'Asian/Pacific Islander/Asian-American' b'Asian/Pacific Islander/Asian-American' b'1' 2.0 4.0 b'[2-5]' b'[2-5]' b'Law' 19.0 18.0 19.0 18.0 14.0 12.0 b'[16-20]' b'[16-20]' b'[16-20]' b'[16-20]' b'[0-15]' b'[0-15]' 10.0 10.0 10.0 10.0 10.0 10.0 b'[9-10]' b'[9-10]' b'[9-10]' b'[9-10]' b'[9-10]' b'[9-10]' 15.0 20.0 20.0 15.0 15.0 15.0 b'[0-15]' b'[16-20]' b'[16-20]' b'[0-15]' b'[0-15]' b'[0-15]' 6.0 8.0 8.0 8.0 7.0 b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' 5.0 8.0 9.0 8.0 5.0 7.0 b'[0-5]' b'[6-8]' b'[9-10]' b'[6-8]' b'[0-5]' b'[6-8]' 9.0 2.0 8.0 9.0 1.0 1.0 5.0 1.0 5.0 6.0 9.0 1.0 10.0 10.0 9.0 8.0 1.0 b'[9-10]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[9-10]' b'[9-10]' b'[9-10]' b'[6-8]' b'[0-5]' 0.16 b'[0-0.33]' 3.0 2.0 4.0 b'[0-4]' b'[0-3]' b'[3-5]' 7.0 b'[6-8]' b'[0-4]' 1.0 b'1' b'1' b'1'
b'' 1.0 b'female' 21.0 23.0 2.0 b'[2-3]' b'Asian/Pacific Islander/Asian-American' b'European/Caucasian-American' b'0' 2.0 4.0 b'[2-5]' b'[2-5]' b'Law' 30.0 5.0 15.0 40.0 5.0 5.0 b'[21-100]' b'[0-15]' b'[0-15]' b'[21-100]' b'[0-15]' b'[0-15]' 7.0 8.0 9.0 8.0 9.0 8.0 b'[6-8]' b'[6-8]' b'[9-10]' b'[6-8]' b'[9-10]' b'[6-8]' 15.0 20.0 20.0 15.0 15.0 15.0 b'[0-15]' b'[16-20]' b'[16-20]' b'[0-15]' b'[0-15]' b'[0-15]' 6.0 8.0 8.0 8.0 7.0 b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' 7.0 6.0 8.0 7.0 6.0 8.0 b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' b'[6-8]' 9.0 2.0 8.0 9.0 1.0 1.0 5.0 1.0 5.0 6.0 9.0 1.0 10.0 10.0 9.0 8.0 1.0 b'[9-10]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[0-5]' b'[6-8]' b'[9-10]' b'[0-5]' b'[9-10]' b'[9-10]' b'[9-10]' b'[6-8]' b'[0-5]' 0.61 b'[0.33-1]' 3.0 2.0 4.0 b'[0-4]' b'[0-3]' b'[3-5]' 7.0 6.0 b'[6-8]' b'[5-6]' 0.0 b'1' b'1' b'1'

Our project seeks to analyze the features above to predict matches.

Potential Problems¶

Data set is huge. Comparing all attributes may be beyond the scope of this assignment. In addition, data may be skewed because of the self perception bias caused by questionaires. People are rarely very accurate measuring themselves. We will simply assume the data is accurate, but perhaps emphasize the partners perception over the self.

This doesn't necesarily correlate to a normal dating environment. Four minute speed dates are very short.

In [21]:
column_meanings = {'has_null': 'Missing values (binary)',
 'wave': 'Group',
 'gender': 'Gender of self',
 'age': 'Age of self',
 'age_o': 'Age of partner',
 'd_age': 'difference in age',
 'd_d_age': 'difference in age',
 'race': 'Race of self',
 'race_o': 'Race of partner',
 'samerace': 'whether the two have the same race',
 'importance_same_race': 'How important is same race to partner',
 'importance_same_religion': 'How important is it that partner has same religion?',
 'd_importance_same_race': 'difference of same race preference',
 'd_importance_same_religion': 'difference of same religion preference',
 'field': 'Degree interest',
 'pref_o_attractive': 'How important does partner rate attractiveness',
 'pref_o_sincere': 'How important does partner rate sincerity',
 'pref_o_intelligence': 'How important does partner rate intelligence',
 'pref_o_funny': 'How important does partner rate funny',
 'pref_o_ambitious': 'How important does partner rate ambition',
 'pref_o_shared_interests': 'How important does partner rate shared interest',
 'd_pref_o_attractive': 'difference of attractiveness importance',
 'd_pref_o_sincere': 'difference of sincerity importance',
 'd_pref_o_intelligence': 'difference of intelligence importance',
 'd_pref_o_funny': 'difference of funny importance',
 'd_pref_o_ambitious': 'difference of ambition importance',
 'd_pref_o_shared_interests': 'difference of shared interest importance',
 'attractive_o': 'Rating by partner (about me) at night of event on attractiveness',
 'sinsere_o': 'Rating by partner (about me) at night of event on sincerity',
 'intelligence_o': 'Rating by partner (about me) at night of event on intelligence',
 'funny_o': 'Rating by partner (about me) at night of event on funny',
 'ambitous_o': 'Rating by partner (about me) at night of event on ambitious',
 'shared_interests_o': 'Rating by partner (about me) at night of event of shared interest',
 'd_attractive_o': 'difference of skipppppp',
 'd_sinsere_o': 'difference of skipppppp',
 'd_intelligence_o': 'difference of skipppppp',
 'd_funny_o': 'difference of skipppppp',
 'd_ambitous_o': 'difference of skipppppp',
 'd_shared_interests_o': 'difference of skipppppp',
 'attractive_important': 'What do you look for in a partner - attractiveness',
 'sincere_important': 'What do you look for in a partner - sincerity',
 'intellicence_important': 'What do you look for in a partner - intelligence',
 'funny_important': 'What do you look for in a partner - being funny',
 'ambtition_important': 'What do you look for in a partner - ambition',
 'shared_interests_important': 'What do you look for in a partner - shared interests',
 'd_attractive_important': 'difference of what you are looking for: Attractiveness',
 'd_sincere_important': 'difference of what you are looking for: Sincerity',
 'd_intellicence_important': 'difference of what you are looking for: intelligence',
 'd_funny_important': 'difference of what you are looking for: funny',
 'd_ambtition_important': 'difference of what you are looking for: Ambition',
 'd_shared_interests_important': 'difference of what you are looking for: Shared interests',
 'attractive': 'Rate yourself - attractiveness',
 'sincere': 'Rate yourself - sincerity',
 'intelligence': 'Rate yourself - intelligence',
 'funny': 'Rate yourself - funny',
 'ambition': 'Rate yourself - ambition',
 'd_attractive': 'Difference of self rated - attractiveness',
 'd_sincere': 'Difference of self rated - sincerity',
 'd_intelligence': 'Difference of self rated - intelligence',
 'd_funny': 'Difference of self rated - funny',
 'd_ambition': 'Difference of self rated - ambition',
 'attractive_partner': 'Rate your partner - attractiveness',
 'sincere_partner': 'Rate your partner - sincerity',
 'intelligence_partner': 'Rate your partner - intelligence',
 'funny_partner': 'Rate your partner - funny',
 'ambition_partner': 'Rate your partner - ambition',
 'shared_interests_partner': 'Rate your partner - shared interests',
 'd_attractive_partner': 'Difference of rated partner attractiveness',
 'd_sincere_partner': 'Difference of rated partner sincerity ',
 'd_intelligence_partner': 'Difference of rated partner intelligence ',
 'd_funny_partner': 'Difference of rated partner funny',
 'd_ambition_partner': 'Difference of rated partner ambition ',
 'd_shared_interests_partner': 'Difference of rated partner: shared interests',
 'sports': '(1-10) Your interest in sports ',
 'tvsports': '(1-10) Your interest in tvsports ',
 'exercise': '(1-10) Your interest in exercise ',
 'dining': '(1-10) Your interest in dining ',
 'museums': '(1-10) Your interest in museums (who likes museums?) ',
 'art': '(1-10) Your interest in art ',
 'hiking': '(1-10) Your interest in hiking ',
 'gaming': '(1-10) Your interest in gaming',
 'clubbing': '(1-10) Your interest in clubbing ',
 'reading': '(1-10) Your interest in reading ',
 'tv': '(1-10) Your interest in tv',
 'theater': '(1-10) Your interest in theater ',
 'movies': '(1-10) Your interest in movies ',
 'concerts': '(1-10) Your interest in concerts ',
 'music': '(1-10) Your interest in music ',
 'shopping': '(1-10) Your interest in shopping ',
 'yoga': '(1-10) Your interest in yoga ',
 'd_sports': '(1-10) Your interest difference in sports',
 'd_tvsports': '(1-10) Your interest difference in tvsports',
 'd_exercise': '(1-10) Your interest difference in exercise',
 'd_dining': '(1-10) Your interest difference in dining',
 'd_museums': '(1-10) Your interest difference in museums (should all be the same)',
 'd_art': '(1-10) Your interest difference in art',
 'd_hiking': '(1-10) Your interest difference in hiking',
 'd_gaming': '(1-10) Your interest difference in gaming',
 'd_clubbing': '(1-10) Your interest difference in clubbing',
 'd_reading': '(1-10) Your interest difference in reading',
 'd_tv': '(1-10) Your interest difference in tv',
 'd_theater': '(1-10) Your interest difference in theater',
 'd_movies': '(1-10) Your interest difference in movies',
 'd_concerts': '(1-10) Your interest difference in concerts',
 'd_music': '(1-10) Your interest difference in music',
 'd_shopping': '(1-10) Your interest difference in shopping',
 'd_yoga': '(1-10) Your interest difference in yoga',
 'interests_correlate': 'Correlation between participant’s and partner’s ratings of interests',
 'd_interests_correlate': 'Difference',
 'expected_happy_with_sd_people': 'How happy do you expect to be with the people you meet during the speed-dating event?',
 'expected_num_interested_in_me': 'Out of the 20 people you will meet, how many do you expect will be interested in dating you?',
 'expected_num_matches': 'How many matches do you expect to get?',
 'd_expected_happy_with_sd_people': 'Difference',
 'd_expected_num_interested_in_me': 'Difference',
 'd_expected_num_matches': 'Difference',
 'like': '(1-10) did you like your partner?',
 'guess_prob_liked': '(1-10) did you think your partner liked you?',
 'd_like': 'difference in liking partner',
 'd_guess_prob_liked': 'difference in whether partner like you',
 'met': 'Have you met your partner before?',
 'decision': 'Did you choose to match?',
 'decision_o': 'Did partner choose to match?',
 'match': 'did both people say yes?'}

Method:¶

We propose logistic regression to determine which factors are predictive of match rate. This will also reveal, compartively, the impact of different traits.