Predicting Apple's yearly stock price percent change using income statement published at the end each year¶

Which part of Apple's income statement, balance/sheet, and etc ... is most directly related to overall increase in uplift of Apple stock valuation. https://www.kaggle.com/datasets/thedevastator/apple-s-historical-financials

apple stock historical data. https://www.kaggle.com/datasets/mukhazarahmad/apple-stock-data-12121980-1132022

regression model prediction of Apple stock: https://medium.com/analytics-vidhya/using-linear-regression-to-predict-aapl-apple-stock-prices-in-python-1a629fbea15b

Real World Probelm that can be solved by Data Science¶

Machine learning module for stock price prediction has been the newest focus of the finance field. A new finance field called 'The quantitative Analyst position has recently opened up and is recognized as a crucial aspect of finance. Oftentimes, machine learning module that is capable of predicting stock price contains highly complex equation, code, and data set that can not easily be obtained or achieved by non-professionals (college students). Even the most excellent finance firm that uses highly accurate machine learning modules to predict stock prices hides its secrete from the public. Formulas range from CAPM (Capital Asset pricing module) to black-scholes module, which is a differential equation and mostly taught during graduate school.

'Stock Market Prediction from WSJ: Text Mining via Sparse Matrix Factorization' is a paper written by Felix Ming Fai Wong, Zhenming Liu, and Mung Chiang from Princeton University. This module uniquely includes word counts from WSJ and finds the correlation in stock price data, articles from WSJ, and between each. Rather than using the conventional machine learning module, the module approaches by adding new x values.

I want to create a machine learning module that predicts positive or negative percentage change in a single firm's stock price (Apple) based on its income statement. This is because the income statement of a publicly operating firm is required by the law to provide information to the public. This means even non-professionals can approach the data set and attempt to create a machine learning module. On top of the availability of the data set, I wanted to make a module out of the income statement because it is an important indicator for stock buyers. I want to see if the change in each category in the income statement affects the stock price.

This is why I have prepared a yearly percent change of the stock price of Apple from 2013 to 2021. I used yfinance to import Apple's past stock price data set. Found the percent change based on Adj Close price that can be seen below. I downloaded past income statement data of Apple from kaggle. I have set the dataset range from 2013 to 2021 and calculated the percent change of each category. Due to some data sets having 0 in some years, percent change values were found to be inf. To make these values useful, I used the sigmoid function to place pct change values in between 0 and 1.

I am planning to use the linear regression module. y_true will be pct_change_yearly and x 1~n would be income statement values. After model.fit(x,y) we will get a module for predicting the percent change of the stock price of Apple yearly. We can test it by putting the income statement values of Apple and predicting the percent change in stock price. To avoid values being used again in prediction, we can use other firms' data (preferably S&P 500 companies as Apple is in it) to test. To make the module more convincing, we can give multiple S&P 500 companies' income statements to build the module. This will increase the accuracy of the model.

How the data will be used to solve the problem¶

We'll cluster multiple firm's income statements. Doing so allows us to discover the correlation between percent change of each category in income statement with percent change of the firm's stock price. Rather than finding the exact stock price of a firm, the module will focus more on positive change or negative change of a firm's stock price per year. We'll find it using linear regression module.

In [1]:
import yfinance as yf
import pandas as pd
In [2]:
df = yf.download(tickers = "AAPL",  # list of tickers
            period = "23y",         # time period
            interval = "1mo",       # trading interval
            ignore_tz = True,      # ignore timezone when aligning data from different exchanges?
            prepost = False)       # download pre/post market hours data?

df
[*********************100%***********************]  1 of 1 completed
Out[2]:
Open High Low Close Adj Close Volume
Date
2000-03-01 1.058594 1.342634 1.017857 1.212612 1.032076 8698356800
2000-04-01 1.209821 1.245536 0.936384 1.107701 0.942785 8662404800
2000-05-01 1.114955 1.127232 0.729911 0.750000 0.638339 9807750400
2000-06-01 0.729911 1.029018 0.717634 0.935268 0.796024 8105204800
2000-07-01 0.930804 1.082589 0.837054 0.907366 0.772276 5746770400
... ... ... ... ... ... ...
2022-10-01 138.210007 157.500000 134.369995 153.339996 152.852661 1868139700
2022-11-01 155.080002 155.449997 134.380005 148.029999 147.559555 1724847700
2022-12-01 148.210007 150.919998 125.870003 129.929993 129.731918 1675657600
2023-01-01 130.279999 147.229996 124.169998 144.289993 144.070023 1443218300
2023-02-01 143.970001 157.380005 141.320007 147.919998 147.919998 1256371743

276 rows × 6 columns

In [3]:
import yfinance as yf
import pandas as pd

# set the ticker symbol and date range
ticker = "AAPL"
start_date = "2012-01-01"
end_date = "2021-12-31"

# download the data using yfinance
data = yf.download(ticker, start=start_date, end=end_date, interval='1mo')

# resample the data to get the yearly averages
data_yearly = data.resample('1Y').mean()

# calculate the percentage change of the adjusted closing price each year
pct_change_yearly = data_yearly['Adj Close'].pct_change() * 100
pct_change_yearly.dropna(inplace=True)


# print the percentage change each year
print(pct_change_yearly)
[*********************100%***********************]  1 of 1 completed
Date
2013-12-31   -17.230957
2014-12-31    42.168712
2015-12-31    29.795524
2016-12-31   -10.931802
2017-12-31    47.749385
2018-12-31    25.529302
2019-12-31    15.546031
2020-12-31    84.295736
2021-12-31    46.920317
Freq: A-DEC, Name: Adj Close, dtype: float64
In [6]:
import numpy as np

def sigmoid(x):
    return 1/(1+np.exp(-x))

#https://linuxhint.com/numpy-sigmoid/

def sigmoid_vectorized(x):
    return np.vectorize(sigmoid)(x)

# load the data
apple_income_statement = pd.read_csv('apple_income_statement.csv', index_col=0).transpose()

# convert the string values to numerical values
apple_income_statement = apple_income_statement.apply(pd.to_numeric, errors='coerce')

# compute the percentage change for each column
pct_change_df = apple_income_statement.pct_change()

pct_arr = sigmoid_vectorized(pct_change_df)

pct = pd.DataFrame(data=pct_arr, index = pct_change_df.index, columns = pct_change_df.columns)

# pct = pd.DataFrame(data=pct_change_df, index = pct_change_df.index, columns = pct_change_df.columns)

pct = pct.iloc[1:]

pct.dropna(axis=1, inplace=True)

pct
Out[6]:
Cash & Equivalents Cash and Short Term Investments Accounts Receivable - Trade, Net Total Receivables, Net Total Inventory Total Current Assets Property/Plant/Equipment, Total - Gross Accumulated Depreciation, Total Property/Plant/Equipment, Total - Net Other Long Term Assets, Total ... Long Term Debt Total Long Term Debt Total Debt Other Liabilities, Total Total Liabilities Common Stock, Total Unrealized Gain (Loss) Other Equity, Total Total Equity Total Liabilities & Shareholders' Equity
2013-12-31 0.526887 0.526068 0.485708 0.718112 0.453151 0.477453 0.268941 0.268941 0.509324 0.413101 ... 0.491330 0.490695 0.485101 0.426371 0.474291 0.512718 0.559419 0.404310 0.546622 0.488554
2014-12-31 0.551616 0.550930 0.481182 0.494690 0.491684 0.503469 1.000000 0.000000 0.492709 0.583360 ... 0.509168 0.509963 0.510164 0.554571 0.515015 0.516293 0.500000 0.653910 0.451322 0.499367
2015-12-31 0.470033 0.470033 0.466863 0.484192 0.448868 0.465584 0.503207 0.514867 0.492656 0.578033 ... 0.476488 0.476488 0.491906 0.496966 0.483806 0.511063 0.364887 0.335834 0.471115 0.481294
2016-12-31 0.527372 0.527372 0.471649 0.480432 0.472040 0.488505 0.499711 0.511501 0.488069 0.493531 ... 0.476683 0.476665 0.491911 0.510272 0.492264 0.501861 0.466268 0.499879 0.471804 0.488393
2017-12-31 0.538047 0.538047 0.561230 0.529258 0.540658 0.533399 0.496854 0.511507 0.480968 0.456703 ... 0.507252 0.511272 0.487055 0.497548 0.506894 0.514966 0.320821 0.396380 0.510655 0.507556
2018-12-31 0.487776 0.487807 0.547007 0.519887 0.537636 0.516266 0.486076 0.483093 0.489740 0.520797 ... 0.512108 0.512085 0.512004 0.475720 0.504762 0.510458 0.053403 0.596483 0.506242 0.505025
2019-12-31 0.513583 0.513550 0.491931 0.505784 0.494297 0.503821 0.500504 0.501635 0.499153 0.525481 ... 0.516383 0.512807 0.507550 0.493556 0.497654 0.504633 0.088400 0.487720 0.509775 0.499822
2020-12-31 0.532186 0.532186 0.465666 0.485414 0.503018 0.501729 0.500276 0.508535 0.490318 0.535286 ... 0.497543 0.497317 0.496720 0.507050 0.496587 0.512424 0.845535 0.399769 0.512730 0.499589
2021-12-31 0.497380 0.497380 0.539551 0.508864 0.557541 0.525261 0.496837 0.498681 0.494444 0.531504 ... 0.500560 0.500327 0.504211 0.481464 0.513157 0.506741 0.347595 0.661470 0.518793 0.514261

9 rows × 26 columns