(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.
Determining where a movie is going to be successfull can be difficult. How do production company's determine a movies budget? Is there a correlation between a movie's budget and its success (measured by IMBb rating and gross proft)?
IMDb is a website that contains details about movies, tv shows, and other forms of media. Some details include cast, production crew and actor biographies, plot summaries, budget, gross profit, ratings, reviews. Using all this data of past movies, production company's may be able to predict the success of their future movies. The goal of this project is to identify a relationship between movie budget, gross profit, and IMBD rating.
If successful, this may be able to predict the the IMBD rating and gross profit of the movie. This kind of predictor could be helpful for production company's determine how much allot a movie's budget.
We will use a Kaggle Dataset of IMDB_Movie_CaseStudy to observe the following features for each movie:
Title | title_year | budget | Gross | actor_1_name | actor_2_name | actor_3_name | actor_1_facebook_likes | actor_2_facebook_likes | actor_3_facebook_likes | IMDb_rating | genre_1 | genre_2 | genre_3 | MetaCritic | Runtime | CVotes10 | CVotes09 | CVotes08 | CVotes07 | CVotes06 | CVotes05 | CVotes04 | CVotes03 | CVotes02 | CVotes01 | CVotesMale | CVotesFemale | CVotesU18 | CVotesU18M | CVotesU18F | CVotes1829 | CVotes1829M | CVotes1829F | CVotes3044 | CVotes3044M | CVotes3044F | CVotes45A | CVotes45AM | CVotes45AF | CVotes1000 | CVotesUS | CVotesnUS | VotesM | VotesF | VotesU18 | VotesU18M | VotesU18F | Votes1829 | Votes1829M | Votes1829F | Votes3044 | Votes3044M | Votes3044F | Votes45A | Votes45AM | Votes45AF | Votes1000 | VotesUS | VotesnUS | content_rating | Country |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
La La Land | 2016 | 30000000 | 151101803 | Ryan Gosling | Emma Stone | Amiée Conn | 14000 | 19000 | 8.2 | Comedy | Drama | Music | 93 | 128 | 74245 | 71191 | 64640 | 38831 | 17377 | 8044 | 3998 | 2839 | 2407 | 6802 | 157693 | 56713 | 2675 | 1784 | 868 | 113008 | 78998 | 32730 | 66058 | 50835 | 14165 | 15765 | 12148 | 3302 | 454 | 33360 | 117987 | 8.2 | 8.1 | 8.9 | 9 | 8.7 | 8.4 | 8.4 | 8.2 | 7.9 | 7.9 | 7.8 | 7.6 | 7.6 | 7.5 | 7.1 | 8.3 | 8.1 | PG-13 | USA | |
Zootopia | 2016 | 150000000 | 341268248 | Ginnifer Goodwin | Jason Bateman | Idris Elba | 2800 | 28000 | 27000 | 8.1 | Animation | Adventure | Comedy | 78 | 108 | 53626 | 70912 | 102352 | 57261 | 16719 | 4539 | 1467 | 733 | 496 | 1386 | 176202 | 52345 | 2362 | 1641 | 706 | 119637 | 87499 | 30813 | 75474 | 61358 | 13034 | 12353 | 9959 | 2151 | 518 | 35975 | 122844 | 8 | 8.3 | 8.4 | 8.3 | 8.7 | 8.2 | 8.1 | 8.4 | 7.8 | 7.8 | 8.1 | 7.8 | 7.8 | 8.1 | 7.6 | 8 | 8 | PG | USA |
Lion | 2016 | 12000000 | 51738905 | Dev Patel | Nicole Kidman | Rooney Mara | 33000 | 96000 | 9800 | 8.1 | Biography | Drama | 69 | 118 | 23325 | 29830 | 40564 | 20296 | 5842 | 1669 | 558 | 309 | 182 | 493 | 68921 | 24977 | 702 | 477 | 220 | 42962 | 29729 | 12780 | 34297 | 26384 | 7413 | 9054 | 6714 | 2184 | 298 | 13478 | 53931 | 8 | 8.4 | 8.3 | 8.2 | 8.7 | 8.1 | 8 | 8.4 | 8 | 7.9 | 8.2 | 8 | 7.9 | 8.4 | 7.1 | 8.1 | 8 | PG-13 | Australia | |
Arrival | 2016 | 47000000 | 100546139 | Amy Adams | Jeremy Renner | Forest Whitaker | 35000 | 5300 | 8 | Drama | Mystery | Sci-Fi | 81 | 116 | 55533 | 87850 | 109536 | 65440 | 26913 | 10556 | 5057 | 3083 | 2194 | 4734 | 237437 | 46272 | 1943 | 1544 | 376 | 126301 | 101741 | 23163 | 111985 | 95005 | 15227 | 24027 | 20118 | 3440 | 537 | 42062 | 163774 | 7.9 | 8 | 8.6 | 8.6 | 8.4 | 8.2 | 8.2 | 8.1 | 7.8 | 7.8 | 7.8 | 7.6 | 7.6 | 7.7 | 7.3 | 8 | 7.9 | PG-13 | USA | |
Manchester by the Sea | 2016 | 9000000 | 47695371 | Casey Affleck | Michelle Williams | Kyle Chandler | 518 | 71000 | 3300 | 7.9 | Drama | 96 | 137 | 18191 | 33532 | 46596 | 29626 | 11879 | 4539 | 1976 | 1233 | 888 | 1834 | 92452 | 22834 | 855 | 681 | 166 | 55475 | 43467 | 11378 | 40645 | 32983 | 7053 | 11361 | 8862 | 2306 | 402 | 20287 | 65837 | 7.9 | 7.7 | 8.5 | 8.5 | 8.1 | 8 | 8.1 | 7.8 | 7.7 | 7.7 | 7.7 | 7.6 | 7.6 | 7.6 | 7.1 | 7.9 | 7.8 | R | USA | ||
Hell or High Water | 2016 | 12000000 | 27007844 | Chris Pine | Jeff Bridges | Ben Foster | 19000 | 12000 | 9000 | 7.7 | Crime | Drama | Thriller | 88 | 102 | 8445 | 19789 | 45260 | 35212 | 11130 | 3102 | 932 | 475 | 233 | 471 | 88398 | 10427 | 564 | 519 | 43 | 41898 | 37112 | 4370 | 40564 | 36251 | 3817 | 10696 | 9091 | 1425 | 403 | 18746 | 57907 | 7.7 | 7.4 | 8.1 | 8.1 | 7.7 | 7.7 | 7.8 | 7.4 | 7.5 | 7.6 | 7.4 | 7.6 | 7.6 | 7.7 | 7.3 | 7.9 | 7.5 | R | USA |
Doctor Strange | 2016 | 165000000 | 232641920 | Benedict Cumberbatch | Chiwetel Ejiofor | Rachel McAdams | 19000 | 46000 | 7.6 | Action | Adventure | Fantasy | 72 | 115 | 38952 | 51465 | 102744 | 83322 | 32430 | 10744 | 3786 | 1854 | 1038 | 2667 | 202386 | 42203 | 2526 | 1970 | 540 | 117060 | 93330 | 22484 | 87961 | 74305 | 12327 | 17122 | 14163 | 2629 | 545 | 36644 | 133095 | 7.5 | 7.8 | 8 | 8 | 8.3 | 7.6 | 7.6 | 7.8 | 7.4 | 7.4 | 7.7 | 7.5 | 7.4 | 7.8 | 7.1 | 7.6 | 7.4 | PG-13 | USA | |
Tangled | 2010 | 260000000 | 200807262 | Brad Garrett | Donna Murphy | M.C. Gainey | 799 | 553 | 284 | 7.8 | Animation | Adventure | Comedy | 71 | 124 | 56575 | 54688 | 97207 | 70947 | 26805 | 8530 | 3043 | 1396 | 805 | 1606 | 166088 | 97213 | 1950 | 1048 | 885 | 144744 | 81897 | 61390 | 89588 | 63534 | 24912 | 15318 | 11277 | 3805 | 622 | 47643 | 148024 | 7.6 | 8.2 | 7.8 | 7.4 | 8.3 | 7.9 | 7.7 | 8.2 | 7.6 | 7.5 | 8 | 7.7 | 7.6 | 7.9 | 6.9 | 7.9 | 7.7 | PG | USA |
The Dark Knight Rises | 2012 | 250000000 | 448130642 | Tom Hardy | Christian Bale | Joseph Gordon-Levitt | 27000 | 23000 | 23000 | 8.4 | Action | Thriller | 78 | 164 | 380589 | 341965 | 281426 | 134959 | 50406 | 20106 | 9589 | 5713 | 4073 | 11988 | 842343 | 143070 | 4726 | 4023 | 672 | 509635 | 425041 | 79826 | 348324 | 299862 | 43434 | 55689 | 46968 | 7741 | 840 | 160533 | 501687 | 8.5 | 8.4 | 8.6 | 8.5 | 8.6 | 8.7 | 8.7 | 8.6 | 8.3 | 8.3 | 8.2 | 7.9 | 7.9 | 7.9 | 7.8 | 8.4 | 8.4 | PG-13 | USA | |
Captain America: Civil War | 2016 | 250000000 | 407197282 | Robert Downey Jr. | Scarlett Johansson | Chris Evans | 21000 | 19000 | 11000 | 7.9 | Action | Adventure | Sci-Fi | 75 | 147 | 81893 | 90156 | 117188 | 79377 | 32782 | 12322 | 5095 | 2994 | 1989 | 7786 | 264239 | 43818 | 3572 | 2865 | 683 | 148991 | 124124 | 23355 | 105069 | 91345 | 12135 | 19151 | 16351 | 2459 | 593 | 48777 | 153638 | 7.8 | 7.9 | 8.3 | 8.3 | 8.6 | 8 | 8 | 8 | 7.7 | 7.7 | 7.8 | 7.6 | 7.6 | 7.9 | 7.5 | 8.1 | 7.7 | PG-13 | USA |
The above columns about facebook likes are not an accurate representation of the actor's popularity now. In 2023, determining an actors' popularity would be based off of soley facebook likes. The columns about each actors genre is not relevant as actors have bodies of work across multiple genres.
Categorizing and analyzing the IMBd votes into so many categories, especially when it's not very clear what the catergory of votes is communicating will not be helpful in the analysis.
We pose our problem as a regression (line of best fit) problem: using some columns above, we seek to estimate the IMBd rating and gross profit of each moive based on the budget.