Code
%config InlineBackend.figure_format ='retina'
%load_ext autoreload
%autoreload 2
%matplotlib inlineExecute the following code blocks to configure the session and import relevant modules.
%config InlineBackend.figure_format ='retina'
%load_ext autoreload
%autoreload 2
%matplotlib inlineimport os
import sys
import math
import numpy as np
import random
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, Subset
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as pltFor reproducibility we can set seeds:
SEED=22
torch.manual_seed(SEED)
random.seed(SEED)
np.random.seed(SEED)
torch.cuda.manual_seed_all(SEED)We add utility classes and functions for plotting and training.
from typing import Optional
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import torchmetrics
class LivePlot():
def __init__(self, left_label="Loss", right_label="Accuracy"):
self.fig = go.FigureWidget(
make_subplots(specs=[[{"secondary_y": True}]])
)
self.fig.update_yaxes(title_text=left_label, secondary_y=False)
self.fig.update_yaxes(title_text=right_label, secondary_y=True)
self.plot_indices = {}
self.trace_secondary = {}
display(self.fig)
self.limits = [0, 0]
self.current_x = 0
def report(self, name: str, value: float, secondary_y: bool = False):
try:
plot_index = self.plot_indices[name]
except KeyError:
plot_index = len(self.fig.data)
self.fig.add_scatter(
y=[], x=[], name=name,
secondary_y=secondary_y
)
self.plot_indices[name] = plot_index
self.trace_secondary[name] = secondary_y
self.fig.data[plot_index].y += (value,)
self.fig.data[plot_index].x += (self.current_x,)
def increment(self, n_ticks: int):
self.limits[1] += n_ticks
self.fig.update_layout(xaxis_range=self.limits)
def set_limit(self, n_ticks: int):
self.limits[1] = n_ticks
self.fig.update_layout(xaxis_range=self.limits)
def tick(self, n_ticks: Optional[int] = None):
if n_ticks is None:
n_ticks = 1
self.current_x += n_ticks
def train(*,
model: torch.nn.Module,
train_loader: DataLoader,
dev_loader: DataLoader,
optimizer: torch.optim.Optimizer,
criterion: torch.nn.Module,
max_epochs: int,
metric: Optional[torchmetrics.metric] = None,
device: Optional[torch.device] = None,
liveplot: Optional[LivePlot]=None):
if device is None:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
for epoch in range(max_epochs):
training_loss_acc = 0
training_examples = 0
model.train()
for i, batch in enumerate(train_loader):
optimizer.zero_grad()
x_batch, y_batch = batch
x_batch = x_batch.to(device)
y_hat = model(x_batch)
loss = criterion(y_hat, y_batch.to(device))
loss.backward()
optimizer.step()
training_loss_acc += loss.item()
training_examples += x_batch.size(0)
model.eval()
with torch.no_grad():
dev_loss_acc = 0
dev_examples = 0
dev_accuracy = 0
for i, batch in enumerate(dev_loader):
x_batch, y_batch = batch
x_batch = x_batch.to(device)
y_hat = model(x_batch)
dev_loss_acc += criterion(y_hat, y_batch.to(device)).item()
dev_examples += x_batch.size(0)
if metric:
dev_accuracy += metric(torch.argmax(y_hat, -1), y_batch)
if liveplot is not None:
liveplot.tick() # Update the liveplot time
liveplot.report("Training loss", training_loss_acc / training_examples)
liveplot.report("Development loss", dev_loss_acc / dev_examples)
if metric:
liveplot.report("Development accuracy", dev_accuracy / (i+1), secondary_y=True)In this lab the idea is to try out different RNN models on the Box & Jenkins monthly airline passengers dataset. The dataset is a monthly time series of airline passengers recorded in the 50’ies and 60’ies. Your task is to build a model to make a future prediction of the number of passengers given a number of observations.
You will download data and prepare it for later analyses. More specifically, you will partition the data into a training and test set. In order to create input / label pairs (X/Y), the data is split into time slices, where a slice corresponds to the input (X) and the consecutive time point the (known) output (Y).
To help you along the way, many steps have been prepared in advance, but in some cases, your task is to complete missing code, indicated by a comment such as # Add code or similar:
# Add code to print "hello world"These code blocks will be followed by a folded code block mark with # solution_hidden that contains a suggested solution. To expand the code block, simply click on it. Try to complete the code yourself with the aid of documentation before looking at the solution!
# solution_hidden: Hello world
print("hello world")Don’t hesitate to change parameter settings and experiment with the model architectures. Also, make sure to examine the contents of variables by printing them. Things to try:
See if you can improve on the model presented in the lecture.
Start by downloading the data and loading it into a pandas dataframe:
%%bash
if [ ! -e airline-passengers.csv ]; then
wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv --no-check-certificate
fiWe modify the data somewhat for easier processing. df.head() simply shows you the first entries of the data frame.
df = pd.read_csv('airline-passengers.csv', names=['time','passengers'], header=0)
df['time'] = pd.to_datetime(df['time'], format='%Y-%m')
df.head()Plot the data for overview:
plt.plot(df.time, df.passengers)PyTorch provides two data primitives (Dataset and DataLoader) that aim to make processing and handling of data easier. We will create a custom dataset class that will store data and scale data to a given range (typically between 0 and 1); many machine learning algorithms work better when features are on similar scales. However, since we want to partition our data for training and testing, we need to take care to apply the same scaling for the test data as we did for the train data. Here is a dataset class with this functionality:
class PassengerDataset(Dataset):
def __init__(self, data, time,
window_size=1, step_size=1,
scaler=None):
self.unscaled = data # (time points, 1)
if scaler is None:
self.scaler = MinMaxScaler(feature_range=(0, 1)).fit(self.unscaled)
else:
self.scaler = scaler
self.time = time.reset_index(drop=True)
self.window_size = window_size
self.step_size = step_size
self.transform()
self.X = torch.from_numpy(np.array(
[self.data[ind] for ind in self.x_indices]
))
self.Y = torch.tensor([self.data[i] for i in self.y_indices]).unsqueeze(-1)
def __getitem__(self, idx):
"""Return X, Y pairs, where X has length window_size"""
return self.X[idx], self.Y[idx]
def __len__(self):
return len(self.X)
@property
def y_indices(self):
for j in np.arange(self.window_size, len(self.data), self.step_size):
yield j
@property
def x_indices(self):
j = -self.step_size
for _, _ in enumerate(self.y_indices):
j = j + self.step_size
yield list(range(j, j + self.window_size))
def transform(self):
scaled = self.scaler.transform(self.unscaled)
self.data = torch.from_numpy(scaled).type(torch.float32)Next, we partition the data into training and test data sets. We first define the size of the training set and the index at which to split the data.
train_fraction = 0.7
split_index = int(df.shape[0] * train_fraction)Finally, we create two datasets based on split_index. Our dataset class defines a function __getitem__ that ensures we obtain input/output (X/Y) data points for training. The window_size parameter can be thought of as the size of the time slice backwards in time that we model the next time point on. Since the time unit is months and there likely is a recurrent yearly seasonality in the data it makes sense to use 12 time steps, but this is a parameter you could modify to see what effect it has on the end results. In particular, if you increase this number, you would look further back in the past when making predictions.
data = torch.tensor(df.passengers, dtype=torch.float32).unsqueeze(-1)
train_data = PassengerDataset(
data[:split_index],
time=df.time[:split_index],
window_size=12
)
test_data = PassengerDataset(
data[split_index:],
time=df.time[split_index:],
window_size=12, scaler=train_data.scaler
)Plot the datasets to visualize the split. Notice in particular the magnitude of the y-values for both datasets.
plt.plot(train_data.time, train_data.data), plt.plot(test_data.time, test_data.data);Complete the model below to include a vanilla RNN layer and a Linear output layer.
If you look at the documentation for the RNN class, you will find that inputs is a 3D tensor ([L,N,Hin] or (seq, batch, feature)), where feature is the size of the input feature (option input_size). Since we are using univariate data (i.e. one feature, namely the number of passengers, per time step), input_size=1. Since most PyTorch operations use batch_size as the leading dimension ((batch, seq, feature)), you should also set batch_size=True
The RNN output is a tensor tuple (output, h_n) corresponding to the output features of the last layer and the final hidden state. The output tensor has shape (batch, seq, output size) and the hidden state tensor has shape (num_layers, batch, output size). In the RNN constructor call, the output is equivalent to hidden_size.
In the model forward pass, you connect the final output to one or multiple Linear layers, whose final outcome is a passenger prediction for the next time point. The final output should therefore have output size 1.
Consult the documentation or lecture notes for more information.
class AirlineRNN(...):
def __init__(self, ...):
# Add one RNN layer and 1-2 Linear layers
# Parametrize hidden and output sizes in function call
super().__init__()
# Input tensor shape: (batch_size, time_steps, features)
self.rnn = nn.RNN(
input_size=...,
hidden_size=...,
num_layers=...,
batch_first=...,
nonlinearity=...
)
# Linear layers have dimension in_features, out_features
self.fc1 = nn.Linear(...)
# Possibly add more layers
def forward(self, x):
# Add forward step; connect layers and return output
# RNN returns (output, h_n)
rnn_out, h_n = self.rnn(...)
rnn_out = rnn_out[:, -1, :] # Select final timestep
x = self.fc1(...)
# Additional Linear layer
return ...
def predict(self, x):
# Add predict function; set model to evaluation mode and
# returns model output for given input
...# solution_hidden: Vanilla RNN model
class AirlineRNN(nn.Module):
def __init__(self, hidden_size=20, output_size=8):
super().__init__()
self.rnn = nn.RNN(
input_size=1,
hidden_size=hidden_size,
num_layers=1,
batch_first=True,
nonlinearity="relu"
)
self.fc1 = nn.Linear(hidden_size, output_size)
self.fc2 = nn.Linear(output_size, 1)
def forward(self, x):
rnn_out, h_n = self.rnn(x)
# rnn_out shape: (batch_size, time_steps, hidden_size)
# For single-layer RNN, take the last timestep output
rnn_out = rnn_out[:, -1, :] # Select final timestep
x = self.fc1(rnn_out)
return self.fc2(x)
def predict(self, x):
self.eval()
with torch.no_grad():
return self(x)Once you are happy with the configuration, initialize the model:
model = AirlineRNN(hidden_size=20)Now that we have the model components in place we can start training and testing our model. We make use of the utility functions defined in the preamble. In each training iteration, we let the model generate predictions and calculate the loss with respect to a loss function. The gradient of the loss with respect to the model parameters is then calculated with backward propagation, which is then passed on to and utilized by the optimizer to update weights that will hopefully improve network performance in the next epoch. To avoid overfitting, the loop adds a step that evaluates model performance on the test dataset.
Before training the model, we define dataloaders for train and test datasets.
train_loader = DataLoader(train_data)
test_loader = DataLoader(test_data)Next we choose an appropriate loss function and assign to variable criterion:
criterion = nn...Loss()# solution_hidden: criterion
criterion = nn.MSELoss()Finally, set the optimizer to SGD.
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)With loss and optimizer in place, we can train the network.
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using {device} device")
epochs = 400
liveplot = LivePlot()
liveplot.fig.update_layout(width=1200, height=800, font_size=18)
liveplot.increment(epochs)
train(model=model,
train_loader=train_loader,
dev_loader=test_loader,
optimizer=optimizer,
criterion=criterion,
max_epochs=epochs,
liveplot=liveplot,
device=device)We finish by defining a function to plot (rescaled) predictions and plot them.
def plot_pred(train_data, test_data, model, figsize=(14, 8), rescale=True):
"""Plot model predictions for train and test data, possibly rescaling y-axis"""
f = train_data.scaler.inverse_transform if rescale else lambda x: x
fig, ax = plt.subplots(figsize=figsize)
ax.plot(train_data.time, f(train_data.data), color="blue")
ax.plot(test_data.time, f(test_data.data), color="orange")
ax.plot(train_data.time[list(train_data.y_indices)],
f(model.predict(train_data.X.to(device)).cpu()).flatten(),
"-o", color="green")
ax.plot(test_data.time[list(test_data.y_indices)],
f(model.predict(test_data.X.to(device)).cpu()).flatten(),
"-*", color="red")
ax.set_title("Model prediction")
plt.show()plot_pred(train_data, test_data, model)Depending on how your model is defined, you will get different outcomes, but it is likely that training has been unsuccessful. Can you think of ways to improve network learning?
# solution_hidden: Improving network learning
#
# optimizer: Stochastic Gradient Descent is generally a poor choice
# for RNNs for various reasons, including 1) vanishing gradient
# problem 2) no adaptive learning rates
#
# learning rate: learning rate selection is critical for RNNs, where
# too high a rate may produce oscillating behaviour or explosion of
# loss
#Revise the previous code blocks and see if you can improve training performance.
# solution_hidden: Improved training
model = AirlineRNN()
train_loader = DataLoader(train_data)
test_loader = DataLoader(test_data)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using {device} device")
epochs = 400
liveplot = LivePlot()
liveplot.fig.update_layout(width=1200, height=800, font_size=18)
liveplot.increment(epochs)
train(model=model,
train_loader=train_loader,
dev_loader=test_loader,
optimizer=optimizer,
criterion=criterion,
max_epochs=epochs,
liveplot=liveplot,
device=device)plot_pred(train_data, test_data, model)In order to get a better understanding of the network topology and weights, here we go through how to extract the relevant parameters from the model. The model consists of several layers that we can inspect by printing the model (the numbers are based on the suggested solution Vanilla RNN model above):
modelThe first RNN layer has input_size=1 (one feature) and hidden_size=20, the number of features in the hidden state.
We can view parameters either with the parameters() function or named_parameters() that also provides the parameter name, either for the model:
print('\n\nModel params:')
for name, param in model.named_parameters():
print(name, param.shape)or just one layer:
print('\n\nRNN layer params:')
for name, param in model.rnn.named_parameters():
print(name, param.shape)Each input with one feature is multiplied by 20 weights (\(W_{ih}\), one for each RNN layer. The hidden state consists of 20 features and is processed by a (20, 20) size weight matrix \(W_{hh}\). The final output vector \(y\) is generated by separate layer(s), here model.fc1 and model.fc2, where the latter outputs the predicted next time point.
Building on session 1, analyse the data set using LSTM layers. Here is a tentative model setup to get you started. Here you could try using multiple layers, in which case you need to return the sequences for all but the last layer (cf Stacked Long Short-Term Memory Networks). If you have time, you can also try out the GRU layers for comparison. You may also want to increase the size of the hidden layers.
class AirlineLSTM(...):
def __init__(...):
# Add LSTM layer(s) and 1-2 Linear layers# solution_hidden: Simple LSTM model
class AirlineLSTM(nn.Module):
def __init__(self, hidden_size=20, output_size=8, num_layers=1):
super().__init__()
self.lstm = nn.LSTM(
input_size=1,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True,
)
self.fc1 = nn.Linear(hidden_size, output_size)
self.fc2 = nn.Linear(output_size, 1)
def forward(self, x):
lstm_out, hidden = self.lstm(x)
# lstm_out shape: (batch_size, time_steps, hidden_size)
# For multiple-layer LSTM, take the last timestep output
lstm_out = lstm_out[:, -1, :] # Select final timestep
x = self.fc1(lstm_out)
return self.fc2(x)
def predict(self, x):
self.eval()
with torch.no_grad():
return self(x)Once you’re satisfied, initialize the model, choose loss function and optimizer and train.
model = AirlineLSTM(hidden_size=20, num_layers=1)criterion = ...# solution_hidden: criterion
criterion = nn.MSELoss()optimizer = ...# solution_hidden: Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using {device} device")
epochs = 100
liveplot = LivePlot()
liveplot.fig.update_layout(width=1200, height=800, font_size=18)
liveplot.increment(epochs)
train(model=model,
train_loader=train_loader,
dev_loader=test_loader,
optimizer=optimizer,
criterion=criterion,
max_epochs=epochs,
liveplot=liveplot,
device=device)plot_pred(train_data, test_data, model)Recall that the default setting for time steps is 12. If you increase this number, you would look further back in time, which in principle should play to the strength of LSTMs. Note that this need not necessarily lead to better predictions.
As you may have noticed, it can be quite tricky to tune the model to make good predictions. The small training data size doesn’t help either; a simple univariate model would perform just as well in this case. Still, this simple example has hopefully provided some insight into time series prediction with recurrent neural networks.
The airplanes dataset is a commonly used example to introduce time series forecasting with Recurrent Neural Networks, and there are numerous blog posts on the topic. There is also a library called Darts that is dedicated to time series analyses in Python. On the Darts website you can find a worked example that highlights some additional aspects of time series forecasting, such as making use of the known covariates month and year.