movie reccomendation¶

Problem¶

One thing I hate is wasting 30-45 minnutes trying to a find a movie I like. This is a process that I wish I could skip but I've often found myself taking that extra 30 minutes searching in hopes of not wasting 1.5-2.5 hours on a subpar movie.

Solution¶

With the dataset provided by Kaggle, A movies popularity and likeness will be guessed based on the attributes it contains. Here we will see if there are certain trends that exist

Impact¶

The data analysis may reveal certain trends like popular casts, directors, or writers, that can be used as indicators whether or not a movie going to be worth watching. It also can measure how successful a certain director or writer is. On a larger scale, it may measure what societies favorite genre is and also what era had the best movies

Dataset¶

We are goinng to be using a Dataset from Kaggle containing IMBD's top 250 movies

https://www.kaggle.com/datasets/rajugc/imdb-top-250-movies-dataset

Features include:

  • rank - Rank of the movie
  • name - Name of the movie
  • year - Release year
  • rating - Rating of the movie
  • genre - Genre of the movie
  • certificate - Certificate of the movie
  • run_time - Total movie run time
  • tagline - Tagline of the movie
  • budget - Budget of the movie
  • box_office - Total box office collection across the world
  • casts - All casts of the movie
  • directors - Director of the movie
  • writers - Writer of the movie
In [1]:
import pandas as pd
mov_df = pd.read_csv('IMDB Top 250 Movies.csv')
mov_df.head()
Out[1]:
rank name year rating genre certificate run_time tagline budget box_office casts directors writers
0 1 The Shawshank Redemption 1994 9.3 Drama R 2h 22m Fear can hold you prisoner. Hope can set you f... 25000000 28884504 Tim Robbins,Morgan Freeman,Bob Gunton,William ... Frank Darabont Stephen King,Frank Darabont
1 2 The Godfather 1972 9.2 Crime,Drama R 2h 55m An offer you can't refuse. 6000000 250341816 Marlon Brando,Al Pacino,James Caan,Diane Keato... Francis Ford Coppola Mario Puzo,Francis Ford Coppola
2 3 The Dark Knight 2008 9.0 Action,Crime,Drama PG-13 2h 32m Why So Serious? 185000000 1006234167 Christian Bale,Heath Ledger,Aaron Eckhart,Mich... Christopher Nolan Jonathan Nolan,Christopher Nolan,David S. Goyer
3 4 The Godfather Part II 1974 9.0 Crime,Drama R 3h 22m All the power on earth can't change destiny. 13000000 47961919 Al Pacino,Robert De Niro,Robert Duvall,Diane K... Francis Ford Coppola Francis Ford Coppola,Mario Puzo
4 5 12 Angry Men 1957 9.0 Crime,Drama Approved 1h 36m Life Is In Their Hands -- Death Is On Their Mi... 350000 955 Henry Fonda,Lee J. Cobb,Martin Balsam,John Fie... Sidney Lumet Reginald Rose

Obstacles¶

Because the data ranges all the way to early dates like 1957 as seen above, attributes like "budget" may not be useful because of the change in economy and film landscape. Additionally, this same principle can be applied to the "box office" attribute where the 5th most popular movie only has 955 while the 4th most popular movie has 47961919. This clearly has to do with the year so using these attributes may be difficult and produce results that are irrelevent

Method¶

Graphs and a best-fit line will be used to see correlations and similarities. Here we are measuring attirbutes against the placement of the movie so many graphs will be constructed to see where different relationships lie