Movie Rating Prediction¶

Motivation:¶

Problem¶

Making a movie that will have a high gross box office can be difficult, what makes a movie successful? Is it the runtime, genre, or movie rating? People usually have strong feelings about movies and some movies have gained legendary status, but does a highly rated movie equal more profits or is there some other factor that drives ratings?

Solution¶

IMDB is an extensive movie database that serves as the main hub for professional and amateur movie critics. Many use the IMDB movie rating scale as the main benchmark when ranking movies. The IMDB movie scale goes from 1 to 10, with 1 being the worst rating and 10 being the best. The database also includes other data about the movies such as budget, genre, release date, actors, etc.

IMDB Ratings Faq

The goal of this project is to identify the relationship between a movie's features and the success of the movie.

Impact¶

If this prediction is successful, this could help out smaller movie makers to help them with their success in movie making. It would also help to increase the quality of the movies in general, as moviemakers could see what types of features they need to focus on to make a movie that people will like.

A negative outcome could be that movie makers look to previous ways of making movies and what has previously worked. Instead of thinking out of the box and introducing new and creative ways of movie-making.

Dataset¶

Detail¶

We will use a Kaggle Dataset of 5000 Movies on IMDB to observe the following features for each movie:

['color', 'director_name', 'num_critic_for_reviews', 'duration', 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name', 'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name', 'movie_title', 'num_voted_users', 'cast_total_facebook_likes', 'actor_3_name', 'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link', 'num_user_for_reviews', 'language', 'country', 'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'

We are planning on looking away from features from this dataset that we think is unrelewant such as color, actors, facebook likes, and title_year

We plan to focus on these features:

  • duration: length of the movie in minutes
  • gross: how much money the movie made
  • genres: what type of movie is it
  • language: what is the main language of the movie
  • budget: amount of money that went into making the movie
  • aspect ratio: aspect ratio of the movie
  • imdb score: imdb rating of the movie

We want to look how these features are tied into the imdb score and how the rating can be increased by changing the factors.

Potential Problems¶

Different people like different types of genres of movies, someone who likes action might not like drama, therefore, giving that movie a lower score. So ratings of the movies are subjective and can be misleading, however, we assume that the ratings are averaged by many people and are also backed up by professional critics who are unbiased.

There is also a mix of qualitative and quantitative features for a movie so we need to figure out the best way of interpreting what is most essential for a movie's success.

Furthermore, this dataset does not take into account storytelling, which many consider an important aspect of movies.

In [4]:
import pandas as pd

df_movies = pd.read_csv("movie_data.csv")

df_movies.head()
Out[4]:
color director_name num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes gross genres ... num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 Color James Cameron 723.0 178.0 0.0 855.0 Joel David Moore 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... 3054.0 English USA PG-13 237000000.0 2009.0 936.0 7.9 1.78 33000
1 Color Gore Verbinski 302.0 169.0 563.0 1000.0 Orlando Bloom 40000.0 309404152.0 Action|Adventure|Fantasy ... 1238.0 English USA PG-13 300000000.0 2007.0 5000.0 7.1 2.35 0
2 Color Sam Mendes 602.0 148.0 0.0 161.0 Rory Kinnear 11000.0 200074175.0 Action|Adventure|Thriller ... 994.0 English UK PG-13 245000000.0 2015.0 393.0 6.8 2.35 85000
3 Color Christopher Nolan 813.0 164.0 22000.0 23000.0 Christian Bale 27000.0 448130642.0 Action|Thriller ... 2701.0 English USA PG-13 250000000.0 2012.0 23000.0 8.5 2.35 164000
4 NaN Doug Walker NaN NaN 131.0 NaN Rob Walker 131.0 NaN Documentary ... NaN NaN NaN NaN NaN NaN 12.0 7.1 NaN 0

5 rows × 28 columns

Method:¶

We believe we can use several methods for this problem, we can start by clustering by genre, language, and age rating for the movie to get a better understanding of the performance for the movie. Then we can use regression. Looking into quantitative features of the movies seeing what values such as movie length or budget will decrease or increase the IMDB rating.

In [ ]: