One thing I hate is wasting 30-45 minnutes trying to a find a movie I like. This is a process that I wish I could skip but I've often found myself taking that extra 30 minutes searching in hopes of not wasting 1.5-2.5 hours on a subpar movie.
With the dataset provided by Kaggle, A movies popularity and likeness will be guessed based on the attributes it contains. Here we will see if there are certain trends that exist
The data analysis may reveal certain trends like popular casts, directors, or writers, that can be used as indicators whether or not a movie going to be worth watching. It also can measure how successful a certain director or writer is. On a larger scale, it may measure what societies favorite genre is and also what era had the best movies
We are goinng to be using a Dataset from Kaggle containing IMBD's top 250 movies
https://www.kaggle.com/datasets/rajugc/imdb-top-250-movies-dataset
Features include:
import pandas as pd
mov_df = pd.read_csv('IMDB Top 250 Movies.csv')
mov_df.head()
rank | name | year | rating | genre | certificate | run_time | tagline | budget | box_office | casts | directors | writers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | The Shawshank Redemption | 1994 | 9.3 | Drama | R | 2h 22m | Fear can hold you prisoner. Hope can set you f... | 25000000 | 28884504 | Tim Robbins,Morgan Freeman,Bob Gunton,William ... | Frank Darabont | Stephen King,Frank Darabont |
1 | 2 | The Godfather | 1972 | 9.2 | Crime,Drama | R | 2h 55m | An offer you can't refuse. | 6000000 | 250341816 | Marlon Brando,Al Pacino,James Caan,Diane Keato... | Francis Ford Coppola | Mario Puzo,Francis Ford Coppola |
2 | 3 | The Dark Knight | 2008 | 9.0 | Action,Crime,Drama | PG-13 | 2h 32m | Why So Serious? | 185000000 | 1006234167 | Christian Bale,Heath Ledger,Aaron Eckhart,Mich... | Christopher Nolan | Jonathan Nolan,Christopher Nolan,David S. Goyer |
3 | 4 | The Godfather Part II | 1974 | 9.0 | Crime,Drama | R | 3h 22m | All the power on earth can't change destiny. | 13000000 | 47961919 | Al Pacino,Robert De Niro,Robert Duvall,Diane K... | Francis Ford Coppola | Francis Ford Coppola,Mario Puzo |
4 | 5 | 12 Angry Men | 1957 | 9.0 | Crime,Drama | Approved | 1h 36m | Life Is In Their Hands -- Death Is On Their Mi... | 350000 | 955 | Henry Fonda,Lee J. Cobb,Martin Balsam,John Fie... | Sidney Lumet | Reginald Rose |
Because the data ranges all the way to early dates like 1957 as seen above, attributes like "budget" may not be useful because of the change in economy and film landscape. Additionally, this same principle can be applied to the "box office" attribute where the 5th most popular movie only has 955 while the 4th most popular movie has 47961919. This clearly has to do with the year so using these attributes may be difficult and produce results that are irrelevent
Graphs and a best-fit line will be used to see correlations and similarities. Here we are measuring attirbutes against the placement of the movie so many graphs will be constructed to see where different relationships lie