Can Machine Learning accurately predict and model stock prices?¶

If so, which industry in the S&P seems to be the most profitable?¶

Quantiative methods of data analysis are paving the future of investment with various notable investment companies incorporating ML into their strategies. Can we train a model to forecast stock-prices, and how accurate/reliable is this process?¶

In [23]:
import pandas as pd
import pprint

all_stocks_df = pd.read_csv("all_stocks_5yr.csv")

data_dictionary = {'date': 'The date at which metrics were recorded for respective company',
                   'open': 'Price of the respective stock at the beginning of the day',
                   'high': 'Highest price for respective stock over the duration of the day',
                   'low': 'Lowest stock price for respective stock over the duration of the day',
                   'volume': 'Total amount of respective stocks traded on that day',
                   'Name': 'NYSE ticker of the respective stock'  
                  }

pp = pprint.PrettyPrinter(width=70, compact=True)
pp.pprint(data_dictionary)

all_stocks_df.head()
{'Name': 'NYSE ticker of the respective stock',
 'date': 'The date at which metrics were recorded for respective '
         'company',
 'high': 'Highest price for respective stock over the duration of '
         'the day',
 'low': 'Lowest stock price for respective stock over the duration '
        'of the day',
 'open': 'Price of the respective stock at the beginning of the day',
 'volume': 'Total amount of respective stocks traded on that day'}
Out[23]:
date open high low close volume Name
0 2013-02-08 15.07 15.12 14.63 14.75 8407500 AAL
1 2013-02-11 14.89 15.01 14.26 14.46 8882000 AAL
2 2013-02-12 14.45 14.51 14.10 14.27 8126000 AAL
3 2013-02-13 14.30 14.94 14.25 14.66 10259500 AAL
4 2013-02-14 14.94 14.96 13.16 13.99 31879900 AAL
In [29]:
# we can access the 5 year historicals of any given stock
aapl_index = all_stocks_df['Name'] == 'AAPL'
all_stocks_df[aapl_index]
Out[29]:
date open high low close volume Name
1259 2013-02-08 67.7142 68.4014 66.8928 67.8542 158168416 AAPL
1260 2013-02-11 68.0714 69.2771 67.6071 68.5614 129029425 AAPL
1261 2013-02-12 68.5014 68.9114 66.8205 66.8428 151829363 AAPL
1262 2013-02-13 66.7442 67.6628 66.1742 66.7156 118721995 AAPL
1263 2013-02-14 66.3599 67.3771 66.2885 66.6556 88809154 AAPL
... ... ... ... ... ... ... ...
2513 2018-02-01 167.1650 168.6200 166.7600 167.7800 47230787 AAPL
2514 2018-02-02 166.0000 166.8000 160.1000 160.5000 86593825 AAPL
2515 2018-02-05 159.1000 163.8800 156.0000 156.4900 72738522 AAPL
2516 2018-02-06 154.8300 163.7200 154.0000 163.0300 68243838 AAPL
2517 2018-02-07 163.0850 163.4000 159.0685 159.5400 51608580 AAPL

1259 rows × 7 columns

We will train predictive models for each company using their 1yr, 2yr, 3yr, and 4yr historicals in order to see how accurately each model can predict the last recorded statistics for each stock. We want to see how reliable machine learning models are in predicting stock prices and according to our models which stocks / industries are the most profitable, least volatile, etc.?¶

In [ ]: