Movie Recommendation System¶

Problem¶

Many popular movie and tv streaming services offer a recommendation system (click here to lean about Netflix's algorithm) where additonal movies are reccommended to the user based on other movies they have watched and liked. However, it always seems that these systems are not super accurate and will give recommendations that don't seem to quite match the other films enjoyed by the user.

Additional articles that talk about this issue:

https://thesundae.net/2019/11/03/the-problem-with-your-netflix-recommendations/

https://mashable.com/article/algorithms-netflix-youtube-spotify

This project attemps to use machine learning to produce titles from a given dataset that are closest matches to a users inputted movie.

The Data¶

We can use this Kaggle Dataset to collect a list of the top 1000 movies

In [3]:
import pandas as pd

pd.read_csv('Top_1000_Highest_Grossing_Movies_Of_All_Time.csv').head()
Out[3]:
Movie Title Year of Realease Genre Movie Rating Duration Gross Worldwide LT Gross Metascore Votes Logline
0 Avatar 2009 Action,Adventure,Fantasy 7.8 162 $760.51M $2,847,397,339 83 1,236,962 A paraplegic Marine dispatched to the moon Pan...
1 Avengers: Endgame 2019 Action,Adventure,Drama 8.4 181 $858.37M $2,797,501,328 78 1,108,641 After the devastating events of Avengers: Infi...
2 Titanic 1997 Drama,Romance 7.9 194 $659.33M $2,201,647,264 75 1,162,142 A seventeen-year-old aristocrat falls in love ...
3 Star Wars: Episode VII - The Force Awakens 2015 Action,Adventure,Sci-Fi 7.8 138 $936.66M $2,069,521,700 80 925,551 As a new threat to the galaxy rises, Rey, a de...
4 Avengers: Infinity War 2018 Action,Adventure,Sci-Fi 8.4 149 $678.82M $2,048,359,754 68 1,062,517 The Avengers and their allies must be willing ...

Features of the dataset¶

  • Movie title: the title of the film
  • Year of release: the year the film was released
  • Genre: generes that the film fits under, can have multiple genres tied to one film
  • Movie Rating: ratings on a scale from 1-10 according to IMDb
  • Duration: length of the film in minutes
  • Gross: how much money was earned in dollars
  • Worldwide LT Gross: worldwide lifetime gross in dollars
  • Metascore: weighted average of critic reviews from 1-100
  • Votes: number of votes from IMDb users
  • Logline: short description of the film

These categories can help match a given movie to the most similar movies based on a few of the attributes. (ie. similar genre and rating)

Solution¶

We will group the movies based on their features and then use a k nearest neigbors algorithm to find the movies that are the "closest" to the provided inputted film. This will allow for accurate recommendations to the degree based on the specified "k".

In [ ]: