Movie Recommendation System¶

Problem¶

Many popular movie and tv streaming services offer a recommendation system (click here to lean about Netflix's algorithm) where additonal movies are reccommended to the user based on other movies they have watched and liked. However, it always seems that these systems are not super accurate and will give recommendations that don't seem to quite match the other films enjoyed by the user.

Additional articles that talk about this issue:

https://thesundae.net/2019/11/03/the-problem-with-your-netflix-recommendations/

https://mashable.com/article/algorithms-netflix-youtube-spotify

This project attemps to use machine learning to produce titles from a given dataset that are closest matches to a users inputted movie.

The Data¶

We can use this Kaggle Dataset to collect a list of the top 1000 movies

In [3]:

import pandas as pd

pd.read_csv('Top_1000_Highest_Grossing_Movies_Of_All_Time.csv').head()

Out[3]:

	Movie Title	Year of Realease	Genre	Movie Rating	Duration	Gross	Worldwide LT Gross	Metascore	Votes	Logline
0	Avatar	2009	Action,Adventure,Fantasy	7.8	162	$760.51M	$2,847,397,339	83	1,236,962	A paraplegic Marine dispatched to the moon Pan...
1	Avengers: Endgame	2019	Action,Adventure,Drama	8.4	181	$858.37M	$2,797,501,328	78	1,108,641	After the devastating events of Avengers: Infi...
2	Titanic	1997	Drama,Romance	7.9	194	$659.33M	$2,201,647,264	75	1,162,142	A seventeen-year-old aristocrat falls in love ...
3	Star Wars: Episode VII - The Force Awakens	2015	Action,Adventure,Sci-Fi	7.8	138	$936.66M	$2,069,521,700	80	925,551	As a new threat to the galaxy rises, Rey, a de...
4	Avengers: Infinity War	2018	Action,Adventure,Sci-Fi	8.4	149	$678.82M	$2,048,359,754	68	1,062,517	The Avengers and their allies must be willing ...

Features of the dataset¶

Movie title: the title of the film
Year of release: the year the film was released
Genre: generes that the film fits under, can have multiple genres tied to one film
Movie Rating: ratings on a scale from 1-10 according to IMDb
Duration: length of the film in minutes
Gross: how much money was earned in dollars
Worldwide LT Gross: worldwide lifetime gross in dollars
Metascore: weighted average of critic reviews from 1-100
Votes: number of votes from IMDb users
Logline: short description of the film

These categories can help match a given movie to the most similar movies based on a few of the attributes. (ie. similar genre and rating)

Solution¶

We will group the movies based on their features and then use a k nearest neigbors algorithm to find the movies that are the "closest" to the provided inputted film. This will allow for accurate recommendations to the degree based on the specified "k".

In [ ]: