I would like to analyze the various sectors within the S&P500 over time. This could be used to get a better idea of how markets and stocks are moving, and could be used by stock traders to make more efficient investment decisions. By having a better understanding of financial metrics of companies within sectors and how that impact stock price historically, they would be better suited to make investment decisions going forward.
Here is a report from S&P Global analyzing the relationship between sector and S&P500 returns: https://www.spglobal.com/spdji/en/documents/research/research-sector-effects-in-the-sp-500.pdf I would like to get more granular and examine how similar (financially) the stocks within each are and to examine which financial metrics exhibit the strongest relationship with stock price change.
Here is article from Investopedia about which metrics are the best for valuing a stock: https://www.investopedia.com/articles/fundamental-analysis/09/five-must-have-metrics-value-investors.asp A quick google search will yield hundreds of articles (this is one of the better ones) telling you how to accurately value a stock. However, most of the articles I clicked on were 'get rich quick' schemes and very few seemed to cite actual research or analysis. By analyzing the S&P500 we can get a better idea of which metrics were, historically, the most related to future stock movement and how those metrics vary by sector.
Firstly, I would have to get a list of all the stocks in the S&P500 as well as their respective sectors. Since the composition of the S&P500 changes over time, it would make sense to remove stocks that weren't in the S&P500 for the selected analysis period. Since the composition of the S&P500 tends to stay roughly the same, this could be achieved manually with relative ease. Alternatively, we could find the composition of the S&P500 at the start of the analysis period and then track the movement of those stocks throughout the period, regardless of whether or not they stay in the index. For now, I have only linked data for the current S&P500, but historical data is incredibly easy to find and is listed on the S&P500 website.
#Import pandas
import pandas as pd
#Get data for stocks and sectors
#Given that is is a relatively small dataset, it would probably be more efficient to just download the csv for the project
url="https://datahub.io/core/s-and-p-500-companies-financials/r/constituents.csv"
c=pd.read_csv(url)
c.head()
Symbol | Name | Sector | |
---|---|---|---|
0 | MMM | 3M Company | Industrials |
1 | AOS | A.O. Smith Corp | Industrials |
2 | ABT | Abbott Laboratories | Health Care |
3 | ABBV | AbbVie Inc. | Health Care |
4 | ACN | Accenture plc | Information Technology |
Columns: 3 [Symbol, Name, Sector] Rows: 505
Each row contains the stock ticker (Symbol), company name (Name), and respective sector(Sector) for a company in the S&P500. There are 505 rows, because several companies in the S&P500 have multiple classes of shares.
Next I would like to get historical financial data and stock movement for each of these companies. This used to be really easy, but Google discontinued their API in 2018. Another solution would be scraping data from Yahoo Finance using the yfinance library.
#Import yfinance
import yfinance as yf
#Get historic stock data
aapl = yf.Ticker("AAPL")
stock_data = aapl.history()
stock_data.head()
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Date | |||||||
2023-01-30 00:00:00-05:00 | 144.739016 | 145.328113 | 142.632232 | 142.781998 | 64015300 | 0.0 | 0.0 |
2023-01-31 00:00:00-05:00 | 142.482450 | 144.119949 | 142.063092 | 144.070023 | 65874500 | 0.0 | 0.0 |
2023-02-01 00:00:00-05:00 | 143.750517 | 146.386491 | 141.104563 | 145.208282 | 77663600 | 0.0 | 0.0 |
2023-02-02 00:00:00-05:00 | 148.673001 | 150.949524 | 147.944119 | 150.590088 | 118339000 | 0.0 | 0.0 |
2023-02-03 00:00:00-05:00 | 147.804328 | 157.140080 | 147.604636 | 154.264465 | 154279900 | 0.0 | 0.0 |
If it works properly this should show the Date (as the data index) as well as the corresponding Open, High, Low, Close, Adj. Close, Volume (how many shares were traded in the day), Dividends, and Stock Splits. The most important column is Adj.Close as it takes stock splits and dividends into account to create a historically comparable
#Get historic income statement data
financial_data = aapl.income_stmt
financial_data
https://s.yimg.com/uc/finance/dd-site/js/main.00a1b2c52cf830cd64e7.min.js WARNING: No decryption keys could be extracted from JS file. Falling back to backup decrypt methods.
--------------------------------------------------------------------------- Exception Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_64996/2499431118.py in <module> 1 #Get historic income statement data ----> 2 financial_data = aapl.income_stmt 3 financial_data ~\anaconda3\lib\site-packages\yfinance\ticker.py in income_stmt(self) 156 @property 157 def income_stmt(self) -> _pd.DataFrame: --> 158 return self.get_income_stmt(pretty=True) 159 160 @property ~\anaconda3\lib\site-packages\yfinance\base.py in get_income_stmt(self, proxy, as_dict, pretty, freq, legacy) 1567 data = self._fundamentals.financials.get_income_scrape(freq=freq, proxy=proxy) 1568 else: -> 1569 data = self._fundamentals.financials.get_income_time_series(freq=freq, proxy=proxy) 1570 1571 if pretty: ~\anaconda3\lib\site-packages\yfinance\scrapers\fundamentals.py in get_income_time_series(self, freq, proxy) 111 res = self._income_time_series 112 if freq not in res: --> 113 res[freq] = self._fetch_time_series("income", freq, proxy=None) 114 return res[freq] 115 ~\anaconda3\lib\site-packages\yfinance\scrapers\fundamentals.py in _fetch_time_series(self, name, timescale, proxy) 140 141 try: --> 142 statement = self._create_financials_table(name, timescale, proxy) 143 144 if statement is not None: ~\anaconda3\lib\site-packages\yfinance\scrapers\fundamentals.py in _create_financials_table(self, name, timescale, proxy) 153 name = "financials" 154 --> 155 keys = self._get_datastore_keys(name, proxy) 156 try: 157 return self.get_financials_time_series(timescale, keys, proxy) ~\anaconda3\lib\site-packages\yfinance\scrapers\fundamentals.py in _get_datastore_keys(self, sub_page, proxy) 160 161 def _get_datastore_keys(self, sub_page, proxy) -> list: --> 162 data_stores = self._data.get_json_data_stores(sub_page, proxy) 163 164 # Step 1: get the keys: ~\anaconda3\lib\site-packages\yfinance\data.py in wrapped(*args, **kwargs) 39 args = tuple([tuple(arg) if isinstance(arg, list) else arg for arg in args]) 40 kwargs = {k: tuple(v) if isinstance(v, list) else v for k, v in kwargs.items()} ---> 41 return func(*args, **kwargs) 42 43 # copy over the lru_cache extra methods to this wrapper to be able to access them ~\anaconda3\lib\site-packages\yfinance\data.py in get_json_data_stores(self, sub_page, proxy) 310 311 # Decrypt! --> 312 stores = decrypt_cryptojs_aes_stores(data, keys) 313 if stores is None: 314 # Maybe Yahoo returned old format, not encrypted ~\anaconda3\lib\site-packages\yfinance\data.py in decrypt_cryptojs_aes_stores(data, keys) 160 pass 161 if not success: --> 162 raise Exception("yfinance failed to decrypt Yahoo data response") 163 164 decoded_stores = json.loads(plaintext) Exception: yfinance failed to decrypt Yahoo data response
This is supposed to pull data from the selected company's income statement. Namely it's revenue, various expenses, and profit. This feature is currently broken (it was working a few days ago), so for now this isn't a good data source.
Unfortunately, yfinance and Yahoo Finance are fighting, with Yahoo Finance trying to actively inhibit web scrapers like yfinance. The library is actively managed and they've been pretty good about updating the decrypt keys, but it is not super reliable as of yet. This would be by far the easiest way to get financial and stock data, but given that this is a relatively recent problem I'm unsure if this is a viable method. As of now using the fundamentalanalysis website API works well and while it only lets you pull data for one stock at a time, we can just loop through all the tickers in the list and store them.
#Import funamentalanalysis to access the fundamental analysis website api
import fundamentalanalysis as fa
#Establish ticker and api_key
#Loop through all tickers in S&P500 to get all data
ticker = "AAPL"
api_key = "6eb8c70712a889b4f7efbf8d8ad902d4"
#Get historical stock price data
stock_data = fa.stock_data(ticker, period="10y", interval="1mo")
stock_data.head()
low | volume | close | open | high | adjclose | |
---|---|---|---|---|---|---|
2013-03-01 | 14.964286 | 9176876800 | 15.809286 | 15.642857 | 16.783930 | 13.654180 |
2013-04-01 | 13.753571 | 10963489600 | 15.813571 | 15.782143 | 15.901786 | 13.657886 |
2013-05-01 | 14.960714 | 9447530400 | 16.061787 | 15.873571 | 16.633928 | 13.872261 |
2013-06-01 | 13.888214 | 7018536000 | 14.161786 | 16.097500 | 16.229643 | 12.312227 |
2013-07-01 | 14.329286 | 6538114800 | 16.161785 | 14.381786 | 16.333570 | 14.051022 |
Columns: 6 ['low', 'volume', 'close', 'open', 'high', 'adjclose'] Rows: 121
This gets the stock data of a company for each month. The total time period and the frequency of the data can be changed, but for now it is set to monthly stock data for 10 years which leads to 121 rows of data. The data includes the stocks high, low, open and close price for that month. It also has share volume and the adjclose which would be the most important metric for analysis. To make the analysis more data efficient we can only call for the "adjclose" column since the others are unlikely to be relevant.
#Get historical financial ratios and metrics
financial_metrics = fa.key_metrics(ticker, api_key, period="annual").transpose()
financial_metrics.head()
period | revenuePerShare | netIncomePerShare | operatingCashFlowPerShare | freeCashFlowPerShare | cashPerShare | bookValuePerShare | tangibleBookValuePerShare | shareholdersEquityPerShare | interestDebtPerShare | ... | averagePayables | averageInventory | daysSalesOutstanding | daysPayablesOutstanding | daysOfInventoryOnHand | receivablesTurnover | payablesTurnover | inventoryTurnover | roe | capexPerShare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | FY | 24.317273 | 6.154614 | 7.532763 | 6.872426 | 2.978793 | 3.124822 | 3.124822 | 3.124822 | 7.585118 | ... | 59439000000 | 5763000000 | 56.400205 | 104.685277 | 8.075698 | 6.471608 | 3.486641 | 45.197331 | 1.969589 | -0.660337 |
2021 | FY | 21.903541 | 5.669029 | 6.229346 | 5.565624 | 3.750553 | 3.777557 | 3.777557 | 3.777557 | 7.626006 | ... | 48529500000 | 5320500000 | 51.390969 | 93.851071 | 11.276593 | 7.102415 | 3.88914 | 32.367933 | 1.500713 | -0.663722 |
2020 | FY | 15.820258 | 3.308587 | 4.64923 | 4.228014 | 5.241031 | 3.765477 | 3.765477 | 3.765477 | 6.64524 | ... | 44266000000 | 4083500000 | 49.787534 | 91.04819 | 8.741883 | 7.331152 | 4.008866 | 41.753016 | 0.878664 | -0.421217 |
2019 | FY | 14.085283 | 2.991446 | 3.756685 | 3.188508 | 5.443948 | 4.898834 | 4.898834 | 4.898834 | 6.043039 | ... | 51062000000 | 4031000000 | 64.258765 | 104.314077 | 9.263639 | 5.680159 | 3.499048 | 39.401364 | 0.610645 | -0.568178 |
2018 | FY | 13.399334 | 3.003354 | 3.906565 | 3.23492 | 3.344902 | 5.405593 | 5.405593 | 5.405593 | 5.939155 | ... | 52468500000 | 4405500000 | 67.332499 | 124.570214 | 8.817631 | 5.420859 | 2.930074 | 41.394338 | 0.555601 | -0.671644 |
5 rows × 58 columns
Index: Year (representing the fiscal year of the companies financial fillings) Columns: 58 ['period', 'revenuePerShare', 'netIncomePerShare', 'operatingCashFlowPerShare', 'freeCashFlowPerShare', 'cashPerShare', 'bookValuePerShare', 'tangibleBookValuePerShare', 'shareholdersEquityPerShare', 'interestDebtPerShare', 'marketCap', 'enterpriseValue', 'peRatio', 'priceToSalesRatio', 'pocfratio', 'pfcfRatio', 'pbRatio', 'ptbRatio', 'evToSales', 'enterpriseValueOverEBITDA', 'evToOperatingCashFlow', 'evToFreeCashFlow', 'earningsYield', 'freeCashFlowYield', 'debtToEquity', 'debtToAssets', 'netDebtToEBITDA', 'currentRatio', 'interestCoverage', 'incomeQuality', 'dividendYield', 'payoutRatio', 'salesGeneralAndAdministrativeToRevenue', 'researchAndDdevelopementToRevenue', 'intangiblesToTotalAssets', 'capexToOperatingCashFlow', 'capexToRevenue', 'capexToDepreciation', 'stockBasedCompensationToRevenue', 'grahamNumber', 'roic', 'returnOnTangibleAssets', 'grahamNetNet', 'workingCapital', 'tangibleAssetValue', 'netCurrentAssetValue', 'investedCapital', 'averageReceivables', 'averagePayables', 'averageInventory', 'daysSalesOutstanding', 'daysPayablesOutstanding', 'daysOfInventoryOnHand', 'receivablesTurnover', 'payablesTurnover', 'inventoryTurnover', 'roe', 'capexPerShare'] Rows: 38
This represents the financial profile of a company in terms of its 58 key financial metrics. Over the course of the analysis we are like to remove some columns that do not add value to our analysis. Additionally, we would only pull data for the relevant analysis year rather than every year that is recorded (in this case 1985-2022)
One thing we could do is use agglomerative clustering to group companies by their financial metrics. We could then compare these groups to the sector groups defined by the S&P500. If there is a high level of similarity, it would show that companies in the same sector tend to have similar financial profiles. This would be important to know, because it would allow a stock trader to better understand and identify the financial metrics of a particular sector. Alternatively, if there is no link between metrics and sectors, it would show that serving similar end markets does not mean that companies have the same business model and financial make-up. Additionally, I could take financial data from a base year, say 2015, and then train a model using the metrics as the x variable and stock data from say 2016-2020 as the y variable. Then could apply feature selection to identify which metrics are the best indicator for a stocks price change over the following years and if those features vary by sector. Finally, we could test the finished model by feeding in metrics from 2020, forecasting stock movement, and then testing it against actual movement from 2021-2023.