There are many factors that make this difficult, but some include: time, market sentiment, future cashflows or growth in earning. Im sure there are various metrics that can all get you closer to finding out a value to give a fair price.
But if it were easy to figure out we would all be rich, and we cant all be rich right........? or can we?
In doing reasearch I found that there is a way to accurately determain sentiment from tweets. This will be helpful as you can add up all of the sentiment scores for a day and see if the price goes up or down that day maybe! https://towardsdatascience.com/can-we-beat-the-stock-market-using-twitter-ef8465fd12e2
The data set below contains tweets that mention the following stocks from 2015 - 2020: apple , Google Inc , Google Inc , Amazon.com , Tesla Inc and Microsoft
import pandas as pd
df_main = pd.read_csv("/Users/Emre/Desktop/DS2501/Tweet.csv")
df_company = pd.read_csv("/Users/Emre/Desktop/DS2501/Company_Tweet.csv")
Description: Tweet_id (int): unique idenitfier for a tweet to match ticker with tweet|ticker_symbol(str): ticker symbol of company
df_company.head()
tweet_id | ticker_symbol | |
---|---|---|
0 | 550803612197457920 | AAPL |
1 | 550803610825928706 | AAPL |
2 | 550803225113157632 | AAPL |
3 | 550802957370159104 | AAPL |
4 | 550802855129382912 | AAPL |
df_company.shape
(4336445, 2)
Description: Tweet_id (int): unique idenitfier for a tweet to match ticker with tweet | writter : "tweeter" | post_date (int): in epoch | body : text of tweet | comment_num: number of comments | retweet_num : number of retweets | like_num : number of likes
df_main.head()
tweet_id | writer | post_date | body | comment_num | retweet_num | like_num | |
---|---|---|---|---|---|---|---|
0 | 550441509175443456 | VisualStockRSRC | 1420070457 | lx21 made $10,008 on $AAPL -Check it out! htt... | 0 | 0 | 1 |
1 | 550441672312512512 | KeralaGuy77 | 1420070496 | Insanity of today weirdo massive selling. $aap... | 0 | 0 | 0 |
2 | 550441732014223360 | DozenStocks | 1420070510 | S&P100 #Stocks Performance $HD $LOW $SBUX $TGT... | 0 | 0 | 0 |
3 | 550442977802207232 | ShowDreamCar | 1420070807 | $GM $TSLA: Volkswagen Pushes 2014 Record Recal... | 0 | 0 | 1 |
4 | 550443807834402816 | i_Know_First | 1420071005 | Swing Trading: Up To 8.91% Return In 14 Days h... | 0 | 0 | 1 |
df_main.shape
(3717964, 7)
Write one or two sentences about how the data will be used to solve the problem. First there will have to be some data set manipulation. we will need to get the price for all of the tweets.
Additionally, VADER (Valence Aware Dictionary and Sentiment Reasoner) will be useful in determaining the sentiment of every tweet, but we must assaign a score to all 3 million data points
Next, I was thinking of using a logistic regression to determain wether or not we should buy today.