(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:

        “We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.

YouTube Video Success Prediction¶

Problem¶

Many content creators on youtube struggle with coming up with ideas for what to produce, or aren't sure why a specific piece of content did well.

Solution¶

The goal of this project is to identify the most relevant features that drive a youtube video to succeed.

Impact¶

The classifier created by this project would be able to aid content creators in making the videos with the best chance of success, however it could also lead to many similar and formulaic videos being made, thus driving down mean quality.

Dataset¶

We would use kaggle's Trending YouTube Video Statistics dataset to observe the following features from each video:

  • Title
  • channel_title
  • publish_time
  • tags
  • views
  • likes
  • dislikes
  • comments_disabled
  • ratings_disabled
  • description
In [5]:
import pandas as pd

df_us_videos = pd.read_csv("USvideos.csv")
df_us_videos.head()
Out[5]:
video_id trending_date title channel_title category_id publish_time tags views likes dislikes comment_count thumbnail_link comments_disabled ratings_disabled video_error_or_removed description
0 2kyS6SvSYSE 17.14.11 WE WANT TO TALK ABOUT OUR MARRIAGE CaseyNeistat 22 2017-11-13T17:13:01.000Z SHANtell martin 748374 57527 2966 15954 https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg False False False SHANTELL'S CHANNEL - https://www.youtube.com/s...
1 1ZAPwfrtAFY 17.14.11 The Trump Presidency: Last Week Tonight with J... LastWeekTonight 24 2017-11-13T07:30:00.000Z last week tonight trump presidency|"last week ... 2418783 97185 6146 12703 https://i.ytimg.com/vi/1ZAPwfrtAFY/default.jpg False False False One year after the presidential election, John...
2 5qpjK5DgCt4 17.14.11 Racist Superman | Rudy Mancuso, King Bach & Le... Rudy Mancuso 23 2017-11-12T19:05:24.000Z racist superman|"rudy"|"mancuso"|"king"|"bach"... 3191434 146033 5339 8181 https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg False False False WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3 puqaWrEC7tY 17.14.11 Nickelback Lyrics: Real or Fake? Good Mythical Morning 24 2017-11-13T11:00:04.000Z rhett and link|"gmm"|"good mythical morning"|"... 343168 10172 666 2146 https://i.ytimg.com/vi/puqaWrEC7tY/default.jpg False False False Today we find out if Link is a Nickelback amat...
4 d380meD0W0M 17.14.11 I Dare You: GOING BALD!? nigahiga 24 2017-11-12T18:01:41.000Z ryan|"higa"|"higatv"|"nigahiga"|"i dare you"|"... 2095731 132235 1989 17518 https://i.ytimg.com/vi/d380meD0W0M/default.jpg False False False I know it's been a while since we did this sho...

We will analyze the effect of the title, tags, publish time, and description on the amount of views, likes, and dislikes the videos get. We will determine which factors are most important in the success of a a video.