If your a frequent user of social media sites like Reddit, Facebook, or Youtube, you are also very aware of the fact that people can be pretty mean and nasty in comment sections. It must be due to the anonymity that people are much more likely to share how they feel about things over the internet than in person. Nonetheless, people's tendancy to comment exactly how they feel about a video leads me to my first question to be answered by data science:
Can we predict the average sentiment of a videos comment section given the videos likes, dislikes, genre, title, etc? Can we do it vice versa, predicting the number of likes given comment amount and sentiment?
import pandas as pd
comments = pd.read_csv('comments.csv')
videos = pd.read_csv('videos-stats.csv')
comments.head(10)
Unnamed: 0 | Video ID | Comment | Likes | Sentiment | |
---|---|---|---|---|---|
0 | 0 | wAZZ-UWGVHI | Let's not forget that Apple Pay in 2014 requir... | 95.0 | 1.0 |
1 | 1 | wAZZ-UWGVHI | Here in NZ 50% of retailers don’t even have co... | 19.0 | 0.0 |
2 | 2 | wAZZ-UWGVHI | I will forever acknowledge this channel with t... | 161.0 | 2.0 |
3 | 3 | wAZZ-UWGVHI | Whenever I go to a place that doesn’t take App... | 8.0 | 0.0 |
4 | 4 | wAZZ-UWGVHI | Apple Pay is so convenient, secure, and easy t... | 34.0 | 2.0 |
5 | 5 | wAZZ-UWGVHI | We’ve been hounding my bank to adopt Apple pay... | 8.0 | 1.0 |
6 | 6 | wAZZ-UWGVHI | We only got Apple Pay in South Africa in 2020/... | 29.0 | 2.0 |
7 | 7 | wAZZ-UWGVHI | For now, I need both Apple Pay and the physica... | 7.0 | 1.0 |
8 | 8 | wAZZ-UWGVHI | In the United States, we have an abundance of ... | 2.0 | 2.0 |
9 | 9 | wAZZ-UWGVHI | In Cambodia, we have a universal QR code syste... | 28.0 | 1.0 |
The top 10 comments of a unique video are given in a row, with the comment's words as a string, likes as a float, and sentiment score as a float.
videos.head()
Unnamed: 0 | Title | Video ID | Published At | Keyword | Likes | Comments | Views | |
---|---|---|---|---|---|---|---|---|
0 | 0 | Apple Pay Is Killing the Physical Wallet After... | wAZZ-UWGVHI | 2022-08-23 | tech | 3407.0 | 672.0 | 135612.0 |
1 | 1 | The most EXPENSIVE thing I own. | b3x28s61q3c | 2022-08-24 | tech | 76779.0 | 4306.0 | 1758063.0 |
2 | 2 | My New House Gaming Setup is SICK! | 4mgePWWCAmA | 2022-08-23 | tech | 63825.0 | 3338.0 | 1564007.0 |
3 | 3 | Petrol Vs Liquid Nitrogen | Freezing Experimen... | kXiYSI7H2b0 | 2022-08-23 | tech | 71566.0 | 1426.0 | 922918.0 |
4 | 4 | Best Back to School Tech 2022! | ErMwWXQxHp0 | 2022-08-08 | tech | 96513.0 | 5155.0 | 1855644.0 |
For video-stats, we see the videos is broader detail, with titles given as a string, publishing date as YYYY-MM-DD, keywords of the video as strings are categorized into 'tech', 'news', 'gaming', 'sports', 'how-to'. etc. We see the total likes, comments and views of the video as floats.
Using the datasets, we can train an ML model to build predictions about comment sentiment given a videos statistics and/or vice versa. Just as we've done in class so far, we can run tests using the dataset with k cross-validation, however, many other video datasets exist and can be used to test against to see effectiveness.