Predicting mean comments :(¶

If your a frequent user of social media sites like Reddit, Facebook, or Youtube, you are also very aware of the fact that people can be pretty mean and nasty in comment sections. It must be due to the anonymity that people are much more likely to share how they feel about things over the internet than in person. Nonetheless, people's tendancy to comment exactly how they feel about a video leads me to my first question to be answered by data science:

Can we predict the average sentiment of a videos comment section given the videos likes, dislikes, genre, title, etc? Can we do it vice versa, predicting the number of likes given comment amount and sentiment?

In [35]:
import pandas as pd
comments = pd.read_csv('comments.csv')
videos = pd.read_csv('videos-stats.csv')
comments.head(10)
Out[35]:
Unnamed: 0 Video ID Comment Likes Sentiment
0 0 wAZZ-UWGVHI Let's not forget that Apple Pay in 2014 requir... 95.0 1.0
1 1 wAZZ-UWGVHI Here in NZ 50% of retailers don’t even have co... 19.0 0.0
2 2 wAZZ-UWGVHI I will forever acknowledge this channel with t... 161.0 2.0
3 3 wAZZ-UWGVHI Whenever I go to a place that doesn’t take App... 8.0 0.0
4 4 wAZZ-UWGVHI Apple Pay is so convenient, secure, and easy t... 34.0 2.0
5 5 wAZZ-UWGVHI We’ve been hounding my bank to adopt Apple pay... 8.0 1.0
6 6 wAZZ-UWGVHI We only got Apple Pay in South Africa in 2020/... 29.0 2.0
7 7 wAZZ-UWGVHI For now, I need both Apple Pay and the physica... 7.0 1.0
8 8 wAZZ-UWGVHI In the United States, we have an abundance of ... 2.0 2.0
9 9 wAZZ-UWGVHI In Cambodia, we have a universal QR code syste... 28.0 1.0

The top 10 comments of a unique video are given in a row, with the comment's words as a string, likes as a float, and sentiment score as a float.

In [33]:
videos.head()
Out[33]:
Unnamed: 0 Title Video ID Published At Keyword Likes Comments Views
0 0 Apple Pay Is Killing the Physical Wallet After... wAZZ-UWGVHI 2022-08-23 tech 3407.0 672.0 135612.0
1 1 The most EXPENSIVE thing I own. b3x28s61q3c 2022-08-24 tech 76779.0 4306.0 1758063.0
2 2 My New House Gaming Setup is SICK! 4mgePWWCAmA 2022-08-23 tech 63825.0 3338.0 1564007.0
3 3 Petrol Vs Liquid Nitrogen | Freezing Experimen... kXiYSI7H2b0 2022-08-23 tech 71566.0 1426.0 922918.0
4 4 Best Back to School Tech 2022! ErMwWXQxHp0 2022-08-08 tech 96513.0 5155.0 1855644.0

For video-stats, we see the videos is broader detail, with titles given as a string, publishing date as YYYY-MM-DD, keywords of the video as strings are categorized into 'tech', 'news', 'gaming', 'sports', 'how-to'. etc. We see the total likes, comments and views of the video as floats.

Using the datasets, we can train an ML model to build predictions about comment sentiment given a videos statistics and/or vice versa. Just as we've done in class so far, we can run tests using the dataset with k cross-validation, however, many other video datasets exist and can be used to test against to see effectiveness.

In [ ]: