movie reccomendation¶

Description¶

A problem that I find myself dealing with is how to decide on what movie to watch. I am constantly scrolling through streaming platforms for upwards of 20-30 minutes. This is very counterproductive and a waste of time, especially when looking for a movie to watch late at night. One thing that would be helpful, is a personalized recommendation tool. Many streaming platforms have a version of this, but it is never fully personalized. The streaming platform recommendations take into account every movie watched by the user, or multiple different users, on that account. There is also no numerical rating system, so these streaming platforms are recommending new movies based solely on what the user has watched, not what they actually enjoyed.

I want to create a tool that allows users to select which movies are included in the analysis of what movies thay have liked. This will make the recommendations more relevant and in turn more effective.

Data¶

In [6]:
import pandas

# reading the CSV file
csvFile = pandas.read_csv('25k IMDb movie Dataset.csv')
    
csvFile.head(6)
Out[6]:
movie title Run Time Rating User Rating Generes Overview Plot Kyeword Director Top 5 Casts Writer year path
0 Top Gun: Maverick $170,000,000 (estimated) 8.6 187K ['Action', 'Drama'] After more than thirty years of service as one... ['fighter jet', 'sequel', 'u.s. navy', 'fighte... Joseph Kosinski ['Jack Epps Jr.', 'Peter Craig', 'Tom Cruise',... Jim Cash -2022 /title/tt1745960/
1 Jurassic World Dominion 2 hours 27 minutes 6 56K ['Action', 'Adventure', 'Sci-Fi'] Four years after the destruction of Isla Nubla... ['dinosaur', 'jurassic park', 'tyrannosaurus r... Colin Trevorrow ['Colin Trevorrow', 'Derek Connolly', 'Chris P... Emily Carmichael -2022 /title/tt8041270/
2 Top Gun $15,000,000 (estimated) 6.9 380K ['Action', 'Drama'] As students at the United States Navy's elite ... ['pilot', 'male camaraderie', 'u.s. navy', 'gr... Tony Scott ['Jack Epps Jr.', 'Ehud Yonay', 'Tom Cruise', ... Jim Cash -1986 /title/tt0092099/
3 Lightyear $71,101,257 5.2 32K ['Animation', 'Action', 'Adventure'] While spending years attempting to return home... ['galaxy', 'spaceship', 'robot', 'rocket', 'sp... Angus MacLane ['Jason Headley', 'Matthew Aldrich', 'Chris Ev... Angus MacLane -2022 /title/tt10298810/
4 Spiderhead not-released 5.4 23K ['Action', 'Crime', 'Drama'] In the near future, convicts are offered the c... ['discover', 'medical', 'test', 'reality', 'fi... Joseph Kosinski ['Rhett Reese', 'Paul Wernick', 'Chris Hemswor... George Saunders -2022 /title/tt9783600/
5 Everything Everywhere All at Once 2 hours 19 minutes 8.3 124K ['Action', 'Adventure', 'Comedy'] An aging Chinese immigrant is swept up in an i... ['multiverse', 'saving the world', 'mother dau... Dan Kwan ['Dan Kwan', 'Daniel Scheinert', 'Michelle Yeo... Daniel Scheinert -2022 /title/tt6710474/
In [11]:
import pprint as pp

feat_dict = {'movie title': 'The title of the movie', 
             'Run Time': 'Total run time of the movie', 
             'Rating': 'Total average user rating', 
             'User Rating': 'Total number of user rate this movie',
             'Generes': 'Types of genres', 
             'Overview': 'A short overview of movie', 
             'Plot Keyword': "Movie's plot keyword",
             'Director': 'Movie director name', 
             'Top 5 Casts': 'Top five movie casts name', 
             'Writer': 'Movie writer name'}

pp.pprint(feat_dict)
{'Director': 'Movie director name',
 'Generes': 'Types of genres',
 'Overview': 'A short overview of movie',
 'Plot Keyword': "Movie's plot keyword",
 'Rating': 'Total average user rating',
 'Run Time': 'Total run time of the movie',
 'Top 5 Casts': 'Top five movie casts name',
 'User Rating': 'Total number of user rate this movie',
 'Writer': 'Movie writer name',
 'movie title': 'The title of the movie'}

Multiple different attributes that contribute to the quality of a movie. These attributes will be perfect for a linear regression analysis.

How this data will be used to fix the problem¶

I want to use a linear regression analysis in order to figure out which attributes are best for predicting a ranking. The user must be able to input specific movies that they want the recommendation to work off of. This will allow the user to have more accurate and personalized recommendations. Another aspect that I will include is a word frequency analysis or an N-grams analysis that I can include as an attribute in the linear regression.