It is hard to tell wether or not a movie can make money. It is common for giant block buster movies to flop on opening weekend reducing the returns to nothing. A recent example of this is "Turning Re", according to this article the budget for the film was 175 million usd. However, currently has only grossed only 20.1 million usd On the other hand, sometimes small productions can blow up and gross a lot of money. An example of this is the "Blair Witch Project" which according to this article despite only having a small budget of 60,000 usd it has grossed over 246 million usd.
Kaggle has a data set containing IMDB's top 250 movies with a load of features on each movie. Utilizing these features the goal would be to accurately predict a movies box office amount.
If we could correctly predict a movies success based of its features and it's budget it could help producers and film studios choose their projects better and not lose money. However, one issue with this is that it could lead to films only being made for profit while ignoring the art form.
Features that the Kaggle dataset has:
Some of these features are not really quantifiable, but are instead nominal. But we could still use it, because if certain directors or writers or even director writer combos lead to a higher chance of a higher boxoffice it could still be an indicator. Another potential issue, is that because these are the top 250 movies on IMDB, they could all be really good box office movies. However, we can still compare how much money proportionally was made to determine what makes the movie that makes the most money
rank | name | year | rating | genre | certicate | run_time | tag_line | budget | boxoffice | actors | directors | writers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | The Shawshank Redemption | 1994 | 9.3 | Drama | R | 2h 22m | Fear can hold you prisoner. Hope can set you f... | 25000000 | 28884504 | Tim Robbins,Morgan Freeman,Bob Gunton,William ... | Frank Darabont | Stephen King,Frank Darabont |
1 | 2 | The Godfather | 1972 | 9.2 | Crime,Drama | R | 2h 55m | An offer you can't refuse. | 6000000 | 250341816 | Marlon Brando,Al Pacino,James Caan,Diane Keato... | Francis Ford Coppola | Mario Puzo,Francis Ford Coppola |
2 | 3 | The Dark Knight | 2008 | 9.0 | Action,Crime,Drama | PG-13 | 2h 32m | Why So Serious? | 185000000 | 1006234167 | Christian Bale,Heath Ledger,Aaron Eckhart,Mich... | Christopher Nolan | Jonathan Nolan,Christopher Nolan,David S. Goyer |
3 | 4 | The Godfather Part II | 1974 | 9.0 | Crime,Drama | R | 3h 22m | All the power on earth can't change destiny. | 13000000 | 47961919 | Al Pacino,Robert De Niro,Robert Duvall,Diane K... | Francis Ford Coppola | Francis Ford Coppola,Mario Puzo |
4 | 5 | 12 Angry Men | 1957 | 9.0 | Crime,Drama | Approved | 1h 36m | Life Is In Their Hands -- Death Is On Their Mi... | 350000 | 955 | Henry Fonda,Lee J. Cobb,Martin Balsam,John Fie... | Sidney Lumet | Reginald Rose |
We will use regression to estimate the box_office amount in proportion with the budget amount given the data. Although this is not a column we can create one that has it. Doing so will allow us to estimate future movies given the same data.