Many popular movie and tv streaming services offer a recommendation system (click here to lean about Netflix's algorithm) where additonal movies are reccommended to the user based on other movies they have watched and liked. However, it always seems that these systems are not super accurate and will give recommendations that don't seem to quite match the other films enjoyed by the user.
Additional articles that talk about this issue:
https://thesundae.net/2019/11/03/the-problem-with-your-netflix-recommendations/
https://mashable.com/article/algorithms-netflix-youtube-spotify
This project attemps to use machine learning to produce titles from a given dataset that are closest matches to a users inputted movie.
We can use this Kaggle Dataset to collect a list of the top 1000 movies
import pandas as pd
pd.read_csv('Top_1000_Highest_Grossing_Movies_Of_All_Time.csv').head()
Movie Title | Year of Realease | Genre | Movie Rating | Duration | Gross | Worldwide LT Gross | Metascore | Votes | Logline | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Avatar | 2009 | Action,Adventure,Fantasy | 7.8 | 162 | $760.51M | $2,847,397,339 | 83 | 1,236,962 | A paraplegic Marine dispatched to the moon Pan... |
1 | Avengers: Endgame | 2019 | Action,Adventure,Drama | 8.4 | 181 | $858.37M | $2,797,501,328 | 78 | 1,108,641 | After the devastating events of Avengers: Infi... |
2 | Titanic | 1997 | Drama,Romance | 7.9 | 194 | $659.33M | $2,201,647,264 | 75 | 1,162,142 | A seventeen-year-old aristocrat falls in love ... |
3 | Star Wars: Episode VII - The Force Awakens | 2015 | Action,Adventure,Sci-Fi | 7.8 | 138 | $936.66M | $2,069,521,700 | 80 | 925,551 | As a new threat to the galaxy rises, Rey, a de... |
4 | Avengers: Infinity War | 2018 | Action,Adventure,Sci-Fi | 8.4 | 149 | $678.82M | $2,048,359,754 | 68 | 1,062,517 | The Avengers and their allies must be willing ... |
These categories can help match a given movie to the most similar movies based on a few of the attributes. (ie. similar genre and rating)
We will group the movies based on their features and then use a k nearest neigbors algorithm to find the movies that are the "closest" to the provided inputted film. This will allow for accurate recommendations to the degree based on the specified "k".