(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.

Movie Sequels and Their Success¶

Motivation:¶

Problem¶

Determining where a movie is going to be successfull can be difficult. How do production company's determine a movies budget? Is there a correlation between a movie's budget and its success (measured by IMBb rating and gross proft)?

Solution¶

IMDb is a website that contains details about movies, tv shows, and other forms of media. Some details include cast, production crew and actor biographies, plot summaries, budget, gross profit, ratings, reviews. Using all this data of past movies, production company's may be able to predict the success of their future movies. The goal of this project is to identify a relationship between movie budget, gross profit, and IMBD rating.

Impact¶

If successful, this may be able to predict the the IMBD rating and gross profit of the movie. This kind of predictor could be helpful for production company's determine how much allot a movie's budget.

Data Set:¶

We will use a Kaggle Dataset of IMDB_Movie_CaseStudy to observe the following features for each movie:

  • Title
  • title_year
  • budget
  • Gross
  • actor_1_name
  • actor_2_name
  • actor_3_name
  • actor_1_facebook_likes
  • actor_2_facebook_likes
  • actor_3_facebook_likes
  • IMDb_rating
  • genre_1
  • genre_2
  • genre_3
  • MetaCritic
  • Runtime
  • CVotes10
  • CVotes09
  • CVotes08
  • CVotes07
  • CVotes06
  • CVotes05
  • CVotes04
  • CVotes03
  • CVotes02
  • CVotes01
  • CVotesMale
  • CVotesFemale
  • CVotesU18
  • CVotesU18M
  • CVotesU18F
  • CVotes1829
  • CVotes1829M
  • CVotes1829F
  • CVotes3044
  • CVotes3044M
  • CVotes3044F
  • CVotes45A
  • CVotes45AM
  • CVotes45AF
  • CVotes1000
  • CVotesUS
  • CVotesnUS
  • VotesM
  • VotesF
  • VotesU18
  • VotesU18M
  • VotesU18F
  • Votes1829
  • Votes1829M
  • Votes1829F
  • Votes3044
  • Votes3044M
  • Votes3044F
  • Votes45A
  • Votes45AM
  • Votes45AF
  • Votes1000
  • VotesUS
  • VotesnUS
  • content_rating
  • Country
Title title_year budget Gross actor_1_name actor_2_name actor_3_name actor_1_facebook_likes actor_2_facebook_likes actor_3_facebook_likes IMDb_rating genre_1 genre_2 genre_3 MetaCritic Runtime CVotes10 CVotes09 CVotes08 CVotes07 CVotes06 CVotes05 CVotes04 CVotes03 CVotes02 CVotes01 CVotesMale CVotesFemale CVotesU18 CVotesU18M CVotesU18F CVotes1829 CVotes1829M CVotes1829F CVotes3044 CVotes3044M CVotes3044F CVotes45A CVotes45AM CVotes45AF CVotes1000 CVotesUS CVotesnUS VotesM VotesF VotesU18 VotesU18M VotesU18F Votes1829 Votes1829M Votes1829F Votes3044 Votes3044M Votes3044F Votes45A Votes45AM Votes45AF Votes1000 VotesUS VotesnUS content_rating Country
La La Land 2016 30000000 151101803 Ryan Gosling Emma Stone Amiée Conn 14000 19000 8.2 Comedy Drama Music 93 128 74245 71191 64640 38831 17377 8044 3998 2839 2407 6802 157693 56713 2675 1784 868 113008 78998 32730 66058 50835 14165 15765 12148 3302 454 33360 117987 8.2 8.1 8.9 9 8.7 8.4 8.4 8.2 7.9 7.9 7.8 7.6 7.6 7.5 7.1 8.3 8.1 PG-13 USA
Zootopia 2016 150000000 341268248 Ginnifer Goodwin Jason Bateman Idris Elba 2800 28000 27000 8.1 Animation Adventure Comedy 78 108 53626 70912 102352 57261 16719 4539 1467 733 496 1386 176202 52345 2362 1641 706 119637 87499 30813 75474 61358 13034 12353 9959 2151 518 35975 122844 8 8.3 8.4 8.3 8.7 8.2 8.1 8.4 7.8 7.8 8.1 7.8 7.8 8.1 7.6 8 8 PG USA
Lion 2016 12000000 51738905 Dev Patel Nicole Kidman Rooney Mara 33000 96000 9800 8.1 Biography Drama 69 118 23325 29830 40564 20296 5842 1669 558 309 182 493 68921 24977 702 477 220 42962 29729 12780 34297 26384 7413 9054 6714 2184 298 13478 53931 8 8.4 8.3 8.2 8.7 8.1 8 8.4 8 7.9 8.2 8 7.9 8.4 7.1 8.1 8 PG-13 Australia
Arrival 2016 47000000 100546139 Amy Adams Jeremy Renner Forest Whitaker 35000 5300 8 Drama Mystery Sci-Fi 81 116 55533 87850 109536 65440 26913 10556 5057 3083 2194 4734 237437 46272 1943 1544 376 126301 101741 23163 111985 95005 15227 24027 20118 3440 537 42062 163774 7.9 8 8.6 8.6 8.4 8.2 8.2 8.1 7.8 7.8 7.8 7.6 7.6 7.7 7.3 8 7.9 PG-13 USA
Manchester by the Sea 2016 9000000 47695371 Casey Affleck Michelle Williams Kyle Chandler 518 71000 3300 7.9 Drama 96 137 18191 33532 46596 29626 11879 4539 1976 1233 888 1834 92452 22834 855 681 166 55475 43467 11378 40645 32983 7053 11361 8862 2306 402 20287 65837 7.9 7.7 8.5 8.5 8.1 8 8.1 7.8 7.7 7.7 7.7 7.6 7.6 7.6 7.1 7.9 7.8 R USA
Hell or High Water 2016 12000000 27007844 Chris Pine Jeff Bridges Ben Foster 19000 12000 9000 7.7 Crime Drama Thriller 88 102 8445 19789 45260 35212 11130 3102 932 475 233 471 88398 10427 564 519 43 41898 37112 4370 40564 36251 3817 10696 9091 1425 403 18746 57907 7.7 7.4 8.1 8.1 7.7 7.7 7.8 7.4 7.5 7.6 7.4 7.6 7.6 7.7 7.3 7.9 7.5 R USA
Doctor Strange 2016 165000000 232641920 Benedict Cumberbatch Chiwetel Ejiofor Rachel McAdams 19000 46000 7.6 Action Adventure Fantasy 72 115 38952 51465 102744 83322 32430 10744 3786 1854 1038 2667 202386 42203 2526 1970 540 117060 93330 22484 87961 74305 12327 17122 14163 2629 545 36644 133095 7.5 7.8 8 8 8.3 7.6 7.6 7.8 7.4 7.4 7.7 7.5 7.4 7.8 7.1 7.6 7.4 PG-13 USA
Tangled 2010 260000000 200807262 Brad Garrett Donna Murphy M.C. Gainey 799 553 284 7.8 Animation Adventure Comedy 71 124 56575 54688 97207 70947 26805 8530 3043 1396 805 1606 166088 97213 1950 1048 885 144744 81897 61390 89588 63534 24912 15318 11277 3805 622 47643 148024 7.6 8.2 7.8 7.4 8.3 7.9 7.7 8.2 7.6 7.5 8 7.7 7.6 7.9 6.9 7.9 7.7 PG USA
The Dark Knight Rises 2012 250000000 448130642 Tom Hardy Christian Bale Joseph Gordon-Levitt 27000 23000 23000 8.4 Action Thriller 78 164 380589 341965 281426 134959 50406 20106 9589 5713 4073 11988 842343 143070 4726 4023 672 509635 425041 79826 348324 299862 43434 55689 46968 7741 840 160533 501687 8.5 8.4 8.6 8.5 8.6 8.7 8.7 8.6 8.3 8.3 8.2 7.9 7.9 7.9 7.8 8.4 8.4 PG-13 USA
Captain America: Civil War 2016 250000000 407197282 Robert Downey Jr. Scarlett Johansson Chris Evans 21000 19000 11000 7.9 Action Adventure Sci-Fi 75 147 81893 90156 117188 79377 32782 12322 5095 2994 1989 7786 264239 43818 3572 2865 683 148991 124124 23355 105069 91345 12135 19151 16351 2459 593 48777 153638 7.8 7.9 8.3 8.3 8.6 8 8 8 7.7 7.7 7.8 7.6 7.6 7.9 7.5 8.1 7.7 PG-13 USA

Potential Problems:¶

The above columns about facebook likes are not an accurate representation of the actor's popularity now. In 2023, determining an actors' popularity would be based off of soley facebook likes. The columns about each actors genre is not relevant as actors have bodies of work across multiple genres.

Categorizing and analyzing the IMBd votes into so many categories, especially when it's not very clear what the catergory of votes is communicating will not be helpful in the analysis.

Method¶

We pose our problem as a regression (line of best fit) problem: using some columns above, we seek to estimate the IMBd rating and gross profit of each moive based on the budget.