Which part of Apple's income statement, balance/sheet, and etc ... is most directly related to overall increase in uplift of Apple stock valuation. https://www.kaggle.com/datasets/thedevastator/apple-s-historical-financials
apple stock historical data. https://www.kaggle.com/datasets/mukhazarahmad/apple-stock-data-12121980-1132022
regression model prediction of Apple stock: https://medium.com/analytics-vidhya/using-linear-regression-to-predict-aapl-apple-stock-prices-in-python-1a629fbea15b
Machine learning module for stock price prediction has been the newest focus of the finance field. A new finance field called 'The quantitative Analyst position has recently opened up and is recognized as a crucial aspect of finance. Oftentimes, machine learning module that is capable of predicting stock price contains highly complex equation, code, and data set that can not easily be obtained or achieved by non-professionals (college students). Even the most excellent finance firm that uses highly accurate machine learning modules to predict stock prices hides its secrete from the public. Formulas range from CAPM (Capital Asset pricing module) to black-scholes module, which is a differential equation and mostly taught during graduate school.
'Stock Market Prediction from WSJ: Text Mining via Sparse Matrix Factorization' is a paper written by Felix Ming Fai Wong, Zhenming Liu, and Mung Chiang from Princeton University. This module uniquely includes word counts from WSJ and finds the correlation in stock price data, articles from WSJ, and between each. Rather than using the conventional machine learning module, the module approaches by adding new x values.
I want to create a machine learning module that predicts positive or negative percentage change in a single firm's stock price (Apple) based on its income statement. This is because the income statement of a publicly operating firm is required by the law to provide information to the public. This means even non-professionals can approach the data set and attempt to create a machine learning module. On top of the availability of the data set, I wanted to make a module out of the income statement because it is an important indicator for stock buyers. I want to see if the change in each category in the income statement affects the stock price.
This is why I have prepared a yearly percent change of the stock price of Apple from 2013 to 2021. I used yfinance to import Apple's past stock price data set. Found the percent change based on Adj Close price that can be seen below. I downloaded past income statement data of Apple from kaggle. I have set the dataset range from 2013 to 2021 and calculated the percent change of each category. Due to some data sets having 0 in some years, percent change values were found to be inf. To make these values useful, I used the sigmoid function to place pct change values in between 0 and 1.
I am planning to use the linear regression module. y_true will be pct_change_yearly and x 1~n would be income statement values. After model.fit(x,y) we will get a module for predicting the percent change of the stock price of Apple yearly. We can test it by putting the income statement values of Apple and predicting the percent change in stock price. To avoid values being used again in prediction, we can use other firms' data (preferably S&P 500 companies as Apple is in it) to test. To make the module more convincing, we can give multiple S&P 500 companies' income statements to build the module. This will increase the accuracy of the model.
We'll cluster multiple firm's income statements. Doing so allows us to discover the correlation between percent change of each category in income statement with percent change of the firm's stock price. Rather than finding the exact stock price of a firm, the module will focus more on positive change or negative change of a firm's stock price per year. We'll find it using linear regression module.
import yfinance as yf
import pandas as pd
df = yf.download(tickers = "AAPL", # list of tickers
period = "23y", # time period
interval = "1mo", # trading interval
ignore_tz = True, # ignore timezone when aligning data from different exchanges?
prepost = False) # download pre/post market hours data?
df
[*********************100%***********************] 1 of 1 completed
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2000-03-01 | 1.058594 | 1.342634 | 1.017857 | 1.212612 | 1.032076 | 8698356800 |
2000-04-01 | 1.209821 | 1.245536 | 0.936384 | 1.107701 | 0.942785 | 8662404800 |
2000-05-01 | 1.114955 | 1.127232 | 0.729911 | 0.750000 | 0.638339 | 9807750400 |
2000-06-01 | 0.729911 | 1.029018 | 0.717634 | 0.935268 | 0.796024 | 8105204800 |
2000-07-01 | 0.930804 | 1.082589 | 0.837054 | 0.907366 | 0.772276 | 5746770400 |
... | ... | ... | ... | ... | ... | ... |
2022-10-01 | 138.210007 | 157.500000 | 134.369995 | 153.339996 | 152.852661 | 1868139700 |
2022-11-01 | 155.080002 | 155.449997 | 134.380005 | 148.029999 | 147.559555 | 1724847700 |
2022-12-01 | 148.210007 | 150.919998 | 125.870003 | 129.929993 | 129.731918 | 1675657600 |
2023-01-01 | 130.279999 | 147.229996 | 124.169998 | 144.289993 | 144.070023 | 1443218300 |
2023-02-01 | 143.970001 | 157.380005 | 141.320007 | 147.919998 | 147.919998 | 1256371743 |
276 rows × 6 columns
import yfinance as yf
import pandas as pd
# set the ticker symbol and date range
ticker = "AAPL"
start_date = "2012-01-01"
end_date = "2021-12-31"
# download the data using yfinance
data = yf.download(ticker, start=start_date, end=end_date, interval='1mo')
# resample the data to get the yearly averages
data_yearly = data.resample('1Y').mean()
# calculate the percentage change of the adjusted closing price each year
pct_change_yearly = data_yearly['Adj Close'].pct_change() * 100
pct_change_yearly.dropna(inplace=True)
# print the percentage change each year
print(pct_change_yearly)
[*********************100%***********************] 1 of 1 completed Date 2013-12-31 -17.230957 2014-12-31 42.168712 2015-12-31 29.795524 2016-12-31 -10.931802 2017-12-31 47.749385 2018-12-31 25.529302 2019-12-31 15.546031 2020-12-31 84.295736 2021-12-31 46.920317 Freq: A-DEC, Name: Adj Close, dtype: float64
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
#https://linuxhint.com/numpy-sigmoid/
def sigmoid_vectorized(x):
return np.vectorize(sigmoid)(x)
# load the data
apple_income_statement = pd.read_csv('apple_income_statement.csv', index_col=0).transpose()
# convert the string values to numerical values
apple_income_statement = apple_income_statement.apply(pd.to_numeric, errors='coerce')
# compute the percentage change for each column
pct_change_df = apple_income_statement.pct_change()
pct_arr = sigmoid_vectorized(pct_change_df)
pct = pd.DataFrame(data=pct_arr, index = pct_change_df.index, columns = pct_change_df.columns)
# pct = pd.DataFrame(data=pct_change_df, index = pct_change_df.index, columns = pct_change_df.columns)
pct = pct.iloc[1:]
pct.dropna(axis=1, inplace=True)
pct
Cash & Equivalents | Cash and Short Term Investments | Accounts Receivable - Trade, Net | Total Receivables, Net | Total Inventory | Total Current Assets | Property/Plant/Equipment, Total - Gross | Accumulated Depreciation, Total | Property/Plant/Equipment, Total - Net | Other Long Term Assets, Total | ... | Long Term Debt | Total Long Term Debt | Total Debt | Other Liabilities, Total | Total Liabilities | Common Stock, Total | Unrealized Gain (Loss) | Other Equity, Total | Total Equity | Total Liabilities & Shareholders' Equity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013-12-31 | 0.526887 | 0.526068 | 0.485708 | 0.718112 | 0.453151 | 0.477453 | 0.268941 | 0.268941 | 0.509324 | 0.413101 | ... | 0.491330 | 0.490695 | 0.485101 | 0.426371 | 0.474291 | 0.512718 | 0.559419 | 0.404310 | 0.546622 | 0.488554 |
2014-12-31 | 0.551616 | 0.550930 | 0.481182 | 0.494690 | 0.491684 | 0.503469 | 1.000000 | 0.000000 | 0.492709 | 0.583360 | ... | 0.509168 | 0.509963 | 0.510164 | 0.554571 | 0.515015 | 0.516293 | 0.500000 | 0.653910 | 0.451322 | 0.499367 |
2015-12-31 | 0.470033 | 0.470033 | 0.466863 | 0.484192 | 0.448868 | 0.465584 | 0.503207 | 0.514867 | 0.492656 | 0.578033 | ... | 0.476488 | 0.476488 | 0.491906 | 0.496966 | 0.483806 | 0.511063 | 0.364887 | 0.335834 | 0.471115 | 0.481294 |
2016-12-31 | 0.527372 | 0.527372 | 0.471649 | 0.480432 | 0.472040 | 0.488505 | 0.499711 | 0.511501 | 0.488069 | 0.493531 | ... | 0.476683 | 0.476665 | 0.491911 | 0.510272 | 0.492264 | 0.501861 | 0.466268 | 0.499879 | 0.471804 | 0.488393 |
2017-12-31 | 0.538047 | 0.538047 | 0.561230 | 0.529258 | 0.540658 | 0.533399 | 0.496854 | 0.511507 | 0.480968 | 0.456703 | ... | 0.507252 | 0.511272 | 0.487055 | 0.497548 | 0.506894 | 0.514966 | 0.320821 | 0.396380 | 0.510655 | 0.507556 |
2018-12-31 | 0.487776 | 0.487807 | 0.547007 | 0.519887 | 0.537636 | 0.516266 | 0.486076 | 0.483093 | 0.489740 | 0.520797 | ... | 0.512108 | 0.512085 | 0.512004 | 0.475720 | 0.504762 | 0.510458 | 0.053403 | 0.596483 | 0.506242 | 0.505025 |
2019-12-31 | 0.513583 | 0.513550 | 0.491931 | 0.505784 | 0.494297 | 0.503821 | 0.500504 | 0.501635 | 0.499153 | 0.525481 | ... | 0.516383 | 0.512807 | 0.507550 | 0.493556 | 0.497654 | 0.504633 | 0.088400 | 0.487720 | 0.509775 | 0.499822 |
2020-12-31 | 0.532186 | 0.532186 | 0.465666 | 0.485414 | 0.503018 | 0.501729 | 0.500276 | 0.508535 | 0.490318 | 0.535286 | ... | 0.497543 | 0.497317 | 0.496720 | 0.507050 | 0.496587 | 0.512424 | 0.845535 | 0.399769 | 0.512730 | 0.499589 |
2021-12-31 | 0.497380 | 0.497380 | 0.539551 | 0.508864 | 0.557541 | 0.525261 | 0.496837 | 0.498681 | 0.494444 | 0.531504 | ... | 0.500560 | 0.500327 | 0.504211 | 0.481464 | 0.513157 | 0.506741 | 0.347595 | 0.661470 | 0.518793 | 0.514261 |
9 rows × 26 columns