Best Streaming Platform Prediction¶

Motivation:¶

Problem¶

A lot of shows are scattered through many many different platforms and not everyone can afford to get all of them. So for a (presumably broke) college student, what is the best singular streaming platform to get?

Solution¶

Shows already come with rankings and age restrictions. Assuming that a college student wishes to watch the best ranked shows that are also age apropriate, data to compare age restrictions, IMDb ratings, and Rotten Tomatoes ratings. Then, a relationship can be found between the streaming platform and the overall rating of shows to determine which streaming platform has the "best" shows.

Impact¶

If successful, this may decrease the amount of money unnecessarily spent by college students on streaming platforms. This would help producers of shows know which streaming platform is also best to go to if most users shift to one rather than the other. This may also cause some streaming platforms to go out of business or have significantly less users.

Dataset¶

Detail¶

We will use this data set from Kaggle to observe the ratings and determine the best shows.

This data set includes:

  • Title
  • Year released
  • Target age group
  • IMDb rating
  • Rotten Tomatoes rating
  • Which streaming platforms it belongs to

The project will use these ratings and target age group to determine the "best" streaming platform.

In [1]:
import pandas as pd

shows = pd.read_csv('tv_shows.csv')
shows.head()
Out[1]:
Unnamed: 0 ID Title Year Age IMDb Rotten Tomatoes Netflix Hulu Prime Video Disney+ Type
0 0 1 Breaking Bad 2008 18+ 9.4/10 100/100 1 0 0 0 1
1 1 2 Stranger Things 2016 16+ 8.7/10 96/100 1 0 0 0 1
2 2 3 Attack on Titan 2013 18+ 9.0/10 95/100 1 1 0 0 1
3 3 4 Better Call Saul 2015 18+ 8.8/10 94/100 1 0 0 0 1
4 4 5 Dark 2017 16+ 8.8/10 93/100 1 0 0 0 1
In [2]:
shows_cols_dict = {'Unnamed: 0': 'row index',
                  'ID': 'unique show ID',
                  'Title': 'title of show',
                  'Year': 'year of show release',
                  'Age': 'target age group',
                  'IMDb': 'IMDb rating of the show, out of 10',
                  'Rotten Tomatoes': 'rotten tomatoes rating of the show out of 100',
                  'Netflix': '1 if show is on netflix',
                  'Hulu': '1 if show is on hulu',
                  'Prime Video': '1 if show is on prime video',
                  'Disney+': '1 if show is on disney+'}

shows_cols_dict
Out[2]:
{'Unnamed: 0': 'row index',
 'ID': 'unique show ID',
 'Title': 'title of show',
 'Year': 'year of show release',
 'Age': 'target age group',
 'IMDb': 'IMDb rating of the show, out of 10',
 'Rotten Tomatoes': 'rotten tomatoes rating of the show out of 100',
 'Netflix': '1 if show is on netflix',
 'Hulu': '1 if show is on hulu',
 'Prime Video': '1 if show is on prime video',
 'Disney+': '1 if show is on disney+'}

Potential Problems¶

One big problem with this data set is that it does not include the genres of the shows. However, to deal with this it could be combined with a different data set that does include the genres but not the ratings by using the show title. Another problem is that not all popular streaming platforms are included here so it raises some questions about how this group is being selected. The final problem is that people who get streaming platforms for movies and not shows will not find this very applicable since some platforms might be better for movies and some might be better for shows. To deal with this, more data could be found of a similar structure but for movies.

Method:¶

We will look at the relationship between the ratings, age groups, and streaming platforms (and ideally genre) to determine which streaming platforms are best for different uses. For example a child versus a college student versus a family with young children and adults. Year can also tell us where the most new shows are.