Netflix vs Amazon vs Hulu -- which platform is my novel TV show/movie more likely to get onto + when?¶

Goals¶

The goal of this project is to analyze a dataset of all the titles on netflix, hulu, and the ones on amazon prime as of mid-2021, and look for trends in when they are added compared to when they are released, if there are networks of directors and casts that are frequent on one platform or another (or all), if more movies are leased close on a platform close to their release date versus tv shows, and if a platform focuses on more genres than another.

Then, the second goal is to build a machine learning algorithm which would predict whether a new show or movie would be likely to be added to a streaming platform based on its release data, cast, director, etc, and which platform would be more likely to buy it.

Motivation¶

This project may help garner insight about which types of content each platform prioritizes, which would help choose a platform to subscribe to when Netflix cracks down on password sharing and I have to start paying for my own account:) Also it would be cool to see if a new show would be likely to be able to get added to one of these platforms based on what is already on them!

DataSet¶

The dataset combines 2 data sets found on Kaggle.com:

  • Netflix
  • Amazon
  • Hulu

There are 12 columns in each data set:

  • show_id (str, unique show id)
  • type (str, Movie or TV Show)
  • title (str)
  • director (str)
  • cast (list of str)
  • country (str, country where it was produced)
  • date_added (str, date added to the platform)
  • release_year (int, actual year of release to the public)
  • rating (str, TV rating ie: PG-13, R, etc)
  • duration (str, number of seasons or minutes)
  • listed_in (list of str, categories ie comedy, documentary, etc)
  • description (str)

A column "category" was added to each data set to indicate the platform it corresponds to, and the 3 data sets were compiled into one to form 1 usable data set (see below!). It will likely have to be further processed including excluding rows which have NaN values.

In [8]:
import pandas as pd
df_netflix = pd.read_csv("netflix_titles.csv", index_col=0)
df_netflix["platform"] = "Netflix"
df_hulu = pd.read_csv("hulu_titles.csv", index_col=0)
df_hulu["platform"] = "Hulu"
df_amazon = pd.read_csv("amazon_prime_titles.csv", index_col=0)
df_amazon["platform"] = "Amazon Prime"
df = pd.concat([df_netflix, df_hulu, df_amazon])
df.head()
Out[8]:
type title director cast country date_added release_year rating duration listed_in description platform
show_id
s1 Movie Dick Johnson Is Dead Kirsten Johnson NaN United States September 25, 2021 2020 PG-13 90 min Documentaries As her father nears the end of his life, filmm... Netflix
s2 TV Show Blood & Water NaN Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... South Africa September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, TV Dramas, TV Mysteries After crossing paths at a party, a Cape Town t... Netflix
s3 TV Show Ganglands Julien Leclercq Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... NaN September 24, 2021 2021 TV-MA 1 Season Crime TV Shows, International TV Shows, TV Act... To protect his family from a powerful drug lor... Netflix
s4 TV Show Jailbirds New Orleans NaN NaN NaN September 24, 2021 2021 TV-MA 1 Season Docuseries, Reality TV Feuds, flirtations and toilet talk go down amo... Netflix
s5 TV Show Kota Factory NaN Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... India September 24, 2021 2021 TV-MA 2 Seasons International TV Shows, Romantic TV Shows, TV ... In a city of coaching centers known to train I... Netflix