(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:
“We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.
Many content creators on youtube struggle with coming up with ideas for what to produce, or aren't sure why a specific piece of content did well.
The goal of this project is to identify the most relevant features that drive a youtube video to succeed.
The classifier created by this project would be able to aid content creators in making the videos with the best chance of success, however it could also lead to many similar and formulaic videos being made, thus driving down mean quality.
We would use kaggle's Trending YouTube Video Statistics dataset to observe the following features from each video:
import pandas as pd
df_us_videos = pd.read_csv("USvideos.csv")
df_us_videos.head()
video_id | trending_date | title | channel_title | category_id | publish_time | tags | views | likes | dislikes | comment_count | thumbnail_link | comments_disabled | ratings_disabled | video_error_or_removed | description | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2kyS6SvSYSE | 17.14.11 | WE WANT TO TALK ABOUT OUR MARRIAGE | CaseyNeistat | 22 | 2017-11-13T17:13:01.000Z | SHANtell martin | 748374 | 57527 | 2966 | 15954 | https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg | False | False | False | SHANTELL'S CHANNEL - https://www.youtube.com/s... |
1 | 1ZAPwfrtAFY | 17.14.11 | The Trump Presidency: Last Week Tonight with J... | LastWeekTonight | 24 | 2017-11-13T07:30:00.000Z | last week tonight trump presidency|"last week ... | 2418783 | 97185 | 6146 | 12703 | https://i.ytimg.com/vi/1ZAPwfrtAFY/default.jpg | False | False | False | One year after the presidential election, John... |
2 | 5qpjK5DgCt4 | 17.14.11 | Racist Superman | Rudy Mancuso, King Bach & Le... | Rudy Mancuso | 23 | 2017-11-12T19:05:24.000Z | racist superman|"rudy"|"mancuso"|"king"|"bach"... | 3191434 | 146033 | 5339 | 8181 | https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg | False | False | False | WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http... |
3 | puqaWrEC7tY | 17.14.11 | Nickelback Lyrics: Real or Fake? | Good Mythical Morning | 24 | 2017-11-13T11:00:04.000Z | rhett and link|"gmm"|"good mythical morning"|"... | 343168 | 10172 | 666 | 2146 | https://i.ytimg.com/vi/puqaWrEC7tY/default.jpg | False | False | False | Today we find out if Link is a Nickelback amat... |
4 | d380meD0W0M | 17.14.11 | I Dare You: GOING BALD!? | nigahiga | 24 | 2017-11-12T18:01:41.000Z | ryan|"higa"|"higatv"|"nigahiga"|"i dare you"|"... | 2095731 | 132235 | 1989 | 17518 | https://i.ytimg.com/vi/d380meD0W0M/default.jpg | False | False | False | I know it's been a while since we did this sho... |
We will analyze the effect of the title, tags, publish time, and description on the amount of views, likes, and dislikes the videos get. We will determine which factors are most important in the success of a a video.