In this article we are going to see how we can predict Amazon stock price with the help of Machine Learning.
import pandas as pd
I have used the 5 years historical data of Amazon.com, Inc. (AMZN). You can download data from following link: Amazon.com, Inc. (AMZN)
inputFolder = "input/"
filePath = inputFolder + "AMZN.csv"
filePath
pandas.read_csv(): Read a comma-separated values (csv) file into DataFrame.
df = pd.read_csv(filePath)
df
df.shape
Data has 1258 rows and 7 columns.
DataFrame.head(n=5):
Return the first n rows.
This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it. Default is 5 number of rows to select.
For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].
df.head()
DataFrame.tail(n=5)
Return the last n rows.
This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows. Default is 5 Number of rows to select.
For negative values of n, this function returns all rows except the first n rows, equivalent to df[n:].
df.tail()
Create a new dataframe with two columns 'Date' and 'Close'. For stock prediction we need only date and closing price. We are using length of original dataframe as index.
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
Parameters
data: ndarray, Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order.
index: Index or array-like
Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
columns: Index or array-like
Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
dtype: dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer.
copy: bool, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input.
new_df = pd.DataFrame(index = range(0,len(df)), columns=['Date', 'Close'])
new_df
DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
Sort object by labels (along an axis).
Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.
df = df.sort_index(ascending = True, axis = 0)
df
for i in range(0, len(df)):
new_df['Date'][i] = df['Date'][i]
new_df['Close'][i] = df['Close'][i]
new_df
new_df.index = new_df.Date
new_df
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Drop specified labels from rows or columns.
Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names.
Parameters
labels: single label or list-like
Index or column labels to drop.
axis{0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘rows’) or columns (1 or ‘columns’).
index: single label or list-like
Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
columns: single label or list-like
Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
inplace: bool, default False
If False, return a copy. Otherwise, do operation inplace and return None.
Returns
DataFrame or None. DataFrame without the removed index or column labels or None if inplace=True.
new_df.drop('Date', axis=1, inplace=True)
new_df
dataset = new_df.values
dataset[:10]
It is important to scale features before training a neural network. Normalization is a common way of doing this scaling.
A way to normalize the input features/variables is the Min-Max scaler. By doing so, all features will be transformed into the range [0,1] meaning that the minimum and maximum value of a feature/variable is going to be 0 and 1, respectively.
from sklearn.preprocessing import MinMaxScaler
fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
X: array-like of shape (n_samples, n_features)
Input samples.
y: array-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
**fit_paramsdict
Additional fit parameters.
Returns
X_new: ndarray array of shape (n_samples, n_features_new)
Transformed array.
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)
scaled_data[:10]
We are dividing data for training and testing. We have 1258 records. We are taking index 0 to 700 for training and from 700 to last for validation.
train = dataset[0:700,:]
valid = dataset[700:,:]
train.shape
valid.shape
train[:5]
valid[:5]
len(train)
X_train, y_train = [], []
for i in range(60, len(train)):
X_train.append(scaled_data[i-60: i, 0])
y_train.append(scaled_data[i,0])
pass
X_train[0]
import numpy as np
X_train, y_train = np.array(X_train), np.array(y_train)
print(X_train[1])
print(y_train[1])
X_train.shape[0]
X_train.shape[1]
X_train.shape
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_train.shape
Now X_train data is ready to be used as input (X) to the LSTM with an input_shape of (60, 1).
import tensorflow as tf
model = tf.keras.Sequential()
The input to every LSTM layer must be three-dimensional.
The three dimensions of this input are:
Samples. One sequence is one sample. A batch is comprised of one or more samples.
Time Steps. One time step is one point of observation in the sample.
Features. One feature is one observation at a time step.
This means that the input layer expects a 3D array of data when fitting the model and when making predictions, even if specific dimensions of the array contain a single value, e.g. one sample or one feature.
Units: The amount of "neurons", or "cells", or whatever the layer has inside it.
The LSTM input layer is defined by the input_shape argument on the first hidden layer. The input_shape argument takes a tuple of two values that define the number of time steps and features.
Hidden layer 1: 50 units/ 50 neurons
Hidden layer 2: 50 units/ 50 neurons
Last layer: 1 unit
model.add(tf.keras.layers.LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(tf.keras.layers.LSTM(units = 50))
model.add(tf.keras.layers.Dense(1))
model.summary()
The mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
history = model.fit(X_train, y_train, epochs = 100, batch_size=10)
history.history['loss'][:10]
print(len(new_df))
print(len(valid))
test_inputs = new_df[len(new_df) - len(valid) - 60:].values
test_inputs[:10]
test_inputs = test_inputs.reshape(-1,1)
test_inputs = scaler.transform(test_inputs)
test_inputs[:10]
X_test = []
for i in range(60, test_inputs.shape[0]):
X_test.append(test_inputs[i-60:i, 0])
X_test = np.array(X_test)
print(X_test)
print(X_test.shape)
print(X_test.shape)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
print(X_test.shape)
closing_price = model.predict(X_test)
closing_price[:10]
closing_price = scaler.inverse_transform(closing_price)
closing_price[:10]
import matplotlib.pyplot as plt
train = new_df[:700]
valid = new_df[700:]
valid['Predictions'] = closing_price
plt.figure(figsize=(16,8))
plt.plot(valid['Close'], color = 'green', label = 'Actual Amazon Inc. Stock Price',ls='--')
plt.plot(valid['Predictions'], color = 'red', label = 'Predicted Amazon Inc. Stock Price',ls='-')
plt.title('Predicted Amazon Inc. Stock Price')
plt.xlabel('Time in days')
plt.ylabel('Stock Price')
plt.legend()
plt.figure(figsize=(16,8))
plt.plot(train['Close'], color = 'blue')
plt.plot(valid[['Close','Predictions']])
plt.title('Amazon Inc. Stock Price')
plt.xlabel('Time in days')
plt.ylabel('Stock Price')