Lab session: predicting time series

Airplanes lab
Author
Affiliation

Per Unneberg

NBIS

Preparation

Configuration

Execute the following code blocks to configure the session and import relevant modules.

Code
%config InlineBackend.figure_format ='retina'
%load_ext autoreload
%autoreload 2
%matplotlib inline
Code
import os
import sys
import math 
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, Subset
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

Lab session: predicting airline passengers

Aims

In this lab the idea is to try out different RNN models on the Box & Jenkins monthly airline passengers dataset. The dataset is a monthly time series of airline passengers recorded in the 50’ies and 60’ies. Your task is to build a model to make a future prediction of the number of passengers given a number of observation.

You will download data and prepare it for later analyses. More specifically, you will partition the data into a training and test set. In order to create input / label pairs (X/Y), the data is split into time slices, where a slice corresponds to the input (X) and the consecutive time point the (known) output (Y).

To help you along the way, some of the steps have been prepared in advance, but in most cases, your task is to complete missing code. Don’t hesitate to change parameter settings and experiment with the model architectures. Also, make sure to examine the contents of variables by printing them. Things to try:

  • change the number of time steps
  • change the number of epoch
  • experiment with the network topology (e.g. number of units in the hidden layer)

See if you can improve on the model presented in the lecture.

Session 1: Vanilla RNN

Download data

Start by downloading the data and loading it into a pandas dataframe:

Code
%%bash
if [ ! -e airline-passengers.csv ]; then
    wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv --no-check-certificate
fi

We modify the data somewhat for easier processing. df.head() simply shows you the first entries of the data frame.

Code
df = pd.read_csv('airline-passengers.csv', names=['time','passengers'], header=0)
df['time'] = pd.to_datetime(df['time'], format='%Y-%m')
df.head()

Plot the data for overview:

Code
plt.plot(df.time, df.passengers)

Custom dataset class

PyTorch provides two data primitives (Dataset and DataLoader) that aim to make processing and handling of data easier. We will create a custom dataset class that will store data and scale data to a given range (typically between 0 and 1); many machine learning algorithms work better when features are on similar scales. However, since we want to partition our data for training and testing, we need to take care apply the same scaling for the test data as we did for the train data. Here is a dataset class with this functionality:

Code
class PassengerDataset(Dataset):
    def __init__(self, data, time,
                 window_size=1, step_size=1,
                 scaler=None):
        self.unscaled = data  # (time points, 1)
        if scaler is None:
            self.scaler = MinMaxScaler(feature_range=(0, 1)).fit(self.unscaled)
        else:
            self.scaler = scaler
        self.time = time.reset_index(drop=True)
        self.window_size = window_size
        self.step_size = step_size
        self.transform()
        self.X = torch.from_numpy(np.array(
            [self.data[ind] for ind in self.x_indices]
        ))
        self.Y = torch.tensor([self.data[i] for i in self.y_indices]).unsqueeze(-1)

    def __getitem__(self, idx):
        return self.X[idx], self.Y[idx]

    def __len__(self):
        return len(self.X)

    @property
    def y_indices(self):
        for j in np.arange(self.window_size, len(self.data), self.step_size):
            yield j        

    @property
    def x_indices(self):
        j = -self.step_size
        for _, _ in enumerate(self.y_indices):
            j = j + self.step_size
            yield list(range(j, j + self.window_size))

    def transform(self):
        scaled = self.scaler.transform(self.unscaled)
        self.data = torch.from_numpy(scaled).type(torch.float32)

Next, we partition the data into training and test data sets. We first define the size of the training set and the index at which to split the data.

Code
train_fraction = 0.7
split_index = int(df.shape[0] * train_fraction)

Finally, we create two datasets based on split_index. Our dataset class defines a function __getitem__ that ensures we option input/output (X/Y) data points for training. The window_size parameter can be thought of as the size of the time slice backwards in time that we model the next time point on. Since the time unit is months and there likely is a recurrent yearly seasonality in the data it makes sense to use 12 time steps, but this is a parameter you could modify to see what effect it has on the end results. In particular, if you increase this number, you would look further back in the past when making predictions.

Code
data = torch.tensor(df.passengers, dtype=torch.float32).unsqueeze(-1)
train = PassengerDataset(
    data[:split_index],
    time=df.time[:split_index],
    window_size=12
)
test = PassengerDataset(
    data[split_index:],
    time=df.time[split_index:],
    window_size=12, scaler=train.scaler
)

Plot the datasets to visualize the split. Notice in particular the magnitude of the y-values for both datasets.

Code
plt.plot(train.time, train.data), plt.plot(test.time, test.data);

The model

Complete the model below to include a RNN layer and a Linear output layer. If you look at the RNN documentation, you will find that inputs is a 3D tensor ([batch, seq_len, hidden]). Since we are using univariate data (i.e. one feature, namely the number of passengers, per time step), hidden=1. Consult the documentation or lecture notes for more information.

Code
class AirlineRNN(...):
    def __init__(...):
        # Add one RNN layer and 1-2 Linear layers
Code
# source_hidden
# Suggested solution: simple RNN model
class AirlineRNN(nn.Module):
    def __init__(self, hidden_size=20, output_size=8):
        super().__init__()
        self.rnn = nn.RNN(
            input_size=1,
            hidden_size=hidden_size,
            num_layers=1,
            batch_first=True,
            nonlinearity="tanh"
        )
        self.fc1 = nn.Linear(hidden_size, output_size)
        self.fc2 = nn.Linear(output_size, 1)

    def forward(self, x):
        rnn_out, hidden = self.rnn(x)
        # rnn_out shape: (batch_size, time_steps, hidden_size)
        # For single-layer RNN, take the last timestep output
        rnn_out = rnn_out[:, -1, :]  # Select final timestep
        x = self.fc1(rnn_out)
        return self.fc2(x)

    def predict(self, x):
        self.eval()
        with torch.no_grad():
            return self(x)

Once you are happy with the configuration, initialize the model and choose an appropriate loss function and optimizer.

Code
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")
Code
model = AirlineRNN()
model.to(device)
Code
# source_hidden
# Solution: loss function
loss_fn = nn.MSELoss()
Code
# source_hidden
# Solution: loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Training

Now that we have the model components in place we define a training loop that will loop through our training samples. In each iteration, we let the model generate predictions and calculate the loss with respect to a loss function. The gradient of the loss with respect to the model parameters is then calculated with backward propagation, which is then passed on to and utilized by the optimizer to update weights that will hopefully improve network performance in the next epoch. To this end, we define an explicit training function:

Code
def train_loop(dataloader, model, loss_fn, optimizer, device, loginterval=None):
    """Train model on data in dataloader subject to loss defined by
    loss_fn and optimize parameters with optimizer."""
    model.train()  # Set model to training mode
    size = len(dataloader.dataset)
    total_loss = 0

    for batch, (X, y) in enumerate(dataloader):
        X = X.to(device)
        y = y.to(device)
        output = model(X)
        loss = loss_fn(output, y)

        optimizer.zero_grad()
        # Backpropagation
        loss.backward()
        # Optimization
        optimizer.step()

        total_loss += loss.item()

        if loginterval is not None and batch % loginterval == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

    avg_loss = total_loss / len(dataloader)
    return avg_loss

To avoid overfitting, we also add a test function that evaluates model performance on the test dataset.

Code
def test_loop(dataloader, model, loss_fn, device, loginterval=None):
    """Evaluate model performance on dataset provided by dataloader."""
    model.eval()  # Set model to evaluation mode
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss = 0

    # Make sure no gradients are calculated during test mode
    with torch.no_grad():
        for X, y in dataloader:
            X = X.to(device)
            y = y.to(device)
            output = model(X)
            test_loss += loss_fn(output, y).item()  # Access directly since we don't apply loss.backward()

    test_loss /= num_batches
    return test_loss

Before training the model, we define dataloaders for train and test datasets.

Code
train_dataloader = DataLoader(train)
test_dataloader = DataLoader(test)
Code
epochs = 200
epoch_loginterval = 10
# Dictionary to store validation metrics; use same keys as Keras
metrics = {'accuracy': [], 'loss': [], 'val_accuracy': [], 'val_loss': []}
for t in range(epochs):
    if (t+1) % epoch_loginterval == 0:
        print(f"Epoch {t+1}/{epochs}\n-------------------------------")
    loss = train_loop(train_dataloader, model, loss_fn, optimizer, device)
    metrics["loss"].append(loss)
    if t % epoch_loginterval == 0:
        print(f"  Train: Loss: {loss:.4f}")
    val_loss = test_loop(test_dataloader, model, loss_fn, device)
    metrics["val_loss"].append(val_loss)
    if t % epoch_loginterval == 0:
        print(f"  Test: Val loss: {val_loss:>8f}")
print("Done!")

Plotting

We define a plotting function to visualize the training metrics.

Code
def plot_loss(metrics, figsize=(12, 6)):
    """Plot loss metrics"""
    fig, ax = plt.subplots(figsize=figsize)
    ax.plot(metrics["loss"], label="train loss", color="red", alpha=0.8)
    ax.plot(metrics["val_loss"], label="val loss", color="orange", alpha=0.8)
    ax.set_title("model loss")
    ax.set_xlabel("epoch")
    ax.set_ylabel("loss")
    ax.legend()
    plt.show()
Code
plot_loss(metrics)

We also would like to visualize the predictions and original data.

Code
def plot_pred(train, test, model, figsize=(14, 8), rescale=True):
    """Plot model predictions for train and test data, possibly rescaling y-axis"""
    f = train.scaler.inverse_transform if rescale else lambda x: x
    fig, ax = plt.subplots(figsize=figsize)
    ax.plot(train.time, f(train.data), color="blue")
    ax.plot(test.time, f(test.data), color="orange")
    ax.plot(train.time[list(train.y_indices)],
            f(model.predict(train.X.to(device)).cpu()).flatten(),
            "-o", color="green")
    ax.plot(test.time[list(test.y_indices)],
            f(model.predict(test.X.to(device)).cpu()).flatten(),
            "-*", color="red")
    ax.set_title("Model prediction")
    plt.show()
Code
plot_pred(train, test, model)

Digression: RNN weights and network topology

In order to get a better understanding of the network topology and weights, here we go through how to extract the relevant parameters from the model. The model consists of several layers that we can inspect by printing the model:

Code
model

The first RNN layer has input_size=1 (one feature) and hidden_size=20, the number of features in the hidden state.

We can view parameters either with the parameters() function or named_parameters() that also provides the parameter name, either for the model:

Code
print('\n\nModel params:')
for name, param in model.named_parameters():
    print(name, param.shape)

or just one layer:

Code
print('\n\nRNN layer params:')
for name, param in model.rnn.named_parameters():
    print(name, param.shape)

Each input with one feature is multiplied by 20 weights (\(W_{ih}\), one for each RNN layer. The hidden state consists of 20 features and is processed by a (20, 20) size weight matrix \(W_{hh}\). The final output vector \(y\) is generated by separate layer(s), here model.fc1 and model.fc2, where the latter outputs the predicted next time point.

Session 2: LSTM (and optionally GRU)

Building on session 1, analyse the data set using LSTM layers. Here is a tentative model setup to get you started. Here you could try using multiple layers, in which case you need to return the sequences for all but the last layer (cf Stacked Long Short-Term Memory Networks). If you have time, you can also try out the GRU layers for comparison. Do you notice any difference?

Code
class AirlineLSTM(...):
    def __init__(...):
        # Add LSTM layer(s) and 1-2 Linear layers
Code
# source_hidden
# Suggested solution: simple LSTM model
class AirlineLSTM(nn.Module):
    def __init__(self, hidden_size=20, output_size=8, num_layers=2):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=1,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
        )
        self.fc1 = nn.Linear(hidden_size, output_size)
        self.fc2 = nn.Linear(output_size, 1)

    def forward(self, x):
        lstm_out, hidden = self.lstm(x)
        # lstm_out shape: (batch_size, time_steps, hidden_size)
        # For multiple-layer LSTM, take the last timestep output
        lstm_out = lstm_out[:, -1, :]  # Select final timestep
        x = self.fc1(lstm_out)
        return self.fc2(x)

    def predict(self, x):
        self.eval()
        with torch.no_grad():
            return self(x)

Once you’re satisfied, initialize the model, choose loss function and optimizer and train.

Code
model = AirlineLSTM()
model.to(device)
Code
# source_hidden
# Solution: loss function
loss_fn = nn.MSELoss()
Code
# source_hidden
# Solution: loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Code
epochs = 200
epoch_loginterval = 10
# Dictionary to store validation metrics; use same keys as Keras
metrics = {'accuracy': [], 'loss': [], 'val_accuracy': [], 'val_loss': []}
for t in range(epochs):
    if (t+1) % epoch_loginterval == 0:
        print(f"Epoch {t+1}/{epochs}\n-------------------------------")
    loss = train_loop(train_dataloader, model, loss_fn, optimizer, device)
    metrics["loss"].append(loss)
    if t % epoch_loginterval == 0:
        print(f"  Train: Loss: {loss:.4f}")
    val_loss = test_loop(test_dataloader, model, loss_fn, device)
    metrics["val_loss"].append(val_loss)
    if t % epoch_loginterval == 0:
        print(f"  Test: Val loss: {val_loss:>8f}")
print("Done!")
Code
plot_loss(metrics)
Code
plot_pred(train, test, model)

Recall that the default setting for time steps is 12. If you increase this number, you would look further back in time, which in principle should play to the strength of LSTMs. Note that this need not necessarily lead to better predictions though.

Resources

The airplanes dataset is a commonly used example to introduce time series forecasting with Recurrent Neural Networks, and there are numerous blog posts on the topic. There is also a library called Darts that is dedicated to time series analyses in Python. On the Darts website you can find a worked example that highlights some additional aspects of time series forecasting, such as making use of the known covariates month and year.