Lab session: predicting time series

Airplanes lab
Author
Affiliation

Per Unneberg

NBIS

Published

May 6, 2026

Preparation

Configuration

Execute the following code blocks to configure the session and import relevant modules.

Code
%config InlineBackend.figure_format ='retina'
%load_ext autoreload
%autoreload 2
%matplotlib inline
Code
import os
import sys
import math 
import numpy as np
import random
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, Subset
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

For reproducibility we can set seeds:

Code
SEED=22
torch.manual_seed(SEED)
random.seed(SEED)
np.random.seed(SEED)
torch.cuda.manual_seed_all(SEED)

Utility classes and functions

We add utility classes and functions for plotting and training.

Code
from typing import Optional
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import torchmetrics

class LivePlot():
    def __init__(self, left_label="Loss", right_label="Accuracy"):
        self.fig = go.FigureWidget(
            make_subplots(specs=[[{"secondary_y": True}]])
        )
        self.fig.update_yaxes(title_text=left_label,  secondary_y=False)
        self.fig.update_yaxes(title_text=right_label, secondary_y=True)

        self.plot_indices = {}
        self.trace_secondary = {}
        display(self.fig)
        self.limits = [0, 0]
        self.current_x = 0

    def report(self, name: str, value: float, secondary_y: bool = False):
        try:
            plot_index = self.plot_indices[name]
        except KeyError:
            plot_index = len(self.fig.data)
            self.fig.add_scatter(
                y=[], x=[], name=name,
                secondary_y=secondary_y
            )
            self.plot_indices[name] = plot_index
            self.trace_secondary[name] = secondary_y
        self.fig.data[plot_index].y += (value,)
        self.fig.data[plot_index].x += (self.current_x,)

    def increment(self, n_ticks: int):
        self.limits[1] += n_ticks
        self.fig.update_layout(xaxis_range=self.limits)

    def set_limit(self, n_ticks: int):
        self.limits[1] = n_ticks
        self.fig.update_layout(xaxis_range=self.limits)

    def tick(self, n_ticks: Optional[int] = None):
        if n_ticks is None:
            n_ticks = 1
        self.current_x += n_ticks

def train(*,
          model: torch.nn.Module,
          train_loader: DataLoader,
          dev_loader: DataLoader,
          optimizer: torch.optim.Optimizer,
          criterion: torch.nn.Module,
          max_epochs: int,
          metric: Optional[torchmetrics.metric] = None,
          device: Optional[torch.device] = None,
          liveplot: Optional[LivePlot]=None):
    if device is None:
        device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    model.to(device)

    for epoch in range(max_epochs):
        training_loss_acc = 0
        training_examples = 0
        model.train()

        for i, batch in enumerate(train_loader):
            optimizer.zero_grad()

            x_batch, y_batch = batch
            x_batch = x_batch.to(device)
            y_hat = model(x_batch)

            loss = criterion(y_hat, y_batch.to(device))
            loss.backward()

            optimizer.step()
            training_loss_acc += loss.item()
            training_examples += x_batch.size(0)

        model.eval()
        with torch.no_grad():
            dev_loss_acc = 0
            dev_examples = 0
            dev_accuracy = 0
            for i, batch in enumerate(dev_loader):
                x_batch, y_batch = batch
                x_batch = x_batch.to(device)
                y_hat = model(x_batch)
                dev_loss_acc += criterion(y_hat, y_batch.to(device)).item()
                dev_examples += x_batch.size(0)
                if metric:
                    dev_accuracy += metric(torch.argmax(y_hat, -1), y_batch)

        if liveplot is not None:
            liveplot.tick() # Update the liveplot time
            liveplot.report("Training loss", training_loss_acc / training_examples)
            liveplot.report("Development loss", dev_loss_acc / dev_examples)
            if metric:
                liveplot.report("Development accuracy", dev_accuracy / (i+1), secondary_y=True)

Lab session: predicting airline passengers

Aims

In this lab the idea is to try out different RNN models on the Box & Jenkins monthly airline passengers dataset. The dataset is a monthly time series of airline passengers recorded in the 50’ies and 60’ies. Your task is to build a model to make a future prediction of the number of passengers given a number of observations.

You will download data and prepare it for later analyses. More specifically, you will partition the data into a training and test set. In order to create input / label pairs (X/Y), the data is split into time slices, where a slice corresponds to the input (X) and the consecutive time point the (known) output (Y).

To help you along the way, many steps have been prepared in advance, but in some cases, your task is to complete missing code, indicated by a comment such as # Add code or similar:

Code
# Add code to print "hello world"

These code blocks will be followed by a folded code block mark with # solution_hidden that contains a suggested solution. To expand the code block, simply click on it. Try to complete the code yourself with the aid of documentation before looking at the solution!

Code
# solution_hidden: Hello world
print("hello world")

Don’t hesitate to change parameter settings and experiment with the model architectures. Also, make sure to examine the contents of variables by printing them. Things to try:

  • change the number of time steps
  • change the number of epoch
  • experiment with the network topology (e.g., number of units in the hidden layer)

See if you can improve on the model presented in the lecture.

Session 1: Vanilla RNN

Download data

Start by downloading the data and loading it into a pandas dataframe:

Code
%%bash
if [ ! -e airline-passengers.csv ]; then
    wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv --no-check-certificate
fi

We modify the data somewhat for easier processing. df.head() simply shows you the first entries of the data frame.

Code
df = pd.read_csv('airline-passengers.csv', names=['time','passengers'], header=0)
df['time'] = pd.to_datetime(df['time'], format='%Y-%m')
df.head()

Plot the data for overview:

Code
plt.plot(df.time, df.passengers)

Custom dataset class

PyTorch provides two data primitives (Dataset and DataLoader) that aim to make processing and handling of data easier. We will create a custom dataset class that will store data and scale data to a given range (typically between 0 and 1); many machine learning algorithms work better when features are on similar scales. However, since we want to partition our data for training and testing, we need to take care to apply the same scaling for the test data as we did for the train data. Here is a dataset class with this functionality:

Code
class PassengerDataset(Dataset):
    def __init__(self, data, time,
                 window_size=1, step_size=1,
                 scaler=None):
        self.unscaled = data  # (time points, 1)
        if scaler is None:
            self.scaler = MinMaxScaler(feature_range=(0, 1)).fit(self.unscaled)
        else:
            self.scaler = scaler
        self.time = time.reset_index(drop=True)
        self.window_size = window_size
        self.step_size = step_size
        self.transform()
        self.X = torch.from_numpy(np.array(
            [self.data[ind] for ind in self.x_indices]
        ))
        self.Y = torch.tensor([self.data[i] for i in self.y_indices]).unsqueeze(-1)

    def __getitem__(self, idx):
        """Return X, Y pairs, where X has length window_size"""
        return self.X[idx], self.Y[idx]

    def __len__(self):
        return len(self.X)

    @property
    def y_indices(self):
        for j in np.arange(self.window_size, len(self.data), self.step_size):
            yield j        

    @property
    def x_indices(self):
        j = -self.step_size
        for _, _ in enumerate(self.y_indices):
            j = j + self.step_size
            yield list(range(j, j + self.window_size))

    def transform(self):
        scaled = self.scaler.transform(self.unscaled)
        self.data = torch.from_numpy(scaled).type(torch.float32)

Next, we partition the data into training and test data sets. We first define the size of the training set and the index at which to split the data.

Code
train_fraction = 0.7
split_index = int(df.shape[0] * train_fraction)

Finally, we create two datasets based on split_index. Our dataset class defines a function __getitem__ that ensures we obtain input/output (X/Y) data points for training. The window_size parameter can be thought of as the size of the time slice backwards in time that we model the next time point on. Since the time unit is months and there likely is a recurrent yearly seasonality in the data it makes sense to use 12 time steps, but this is a parameter you could modify to see what effect it has on the end results. In particular, if you increase this number, you would look further back in the past when making predictions.

Code
data = torch.tensor(df.passengers, dtype=torch.float32).unsqueeze(-1)
train_data = PassengerDataset(
    data[:split_index],
    time=df.time[:split_index],
    window_size=12
)
test_data = PassengerDataset(
    data[split_index:],
    time=df.time[split_index:],
    window_size=12, scaler=train_data.scaler
)

Plot the datasets to visualize the split. Notice in particular the magnitude of the y-values for both datasets.

Code
plt.plot(train_data.time, train_data.data), plt.plot(test_data.time, test_data.data);

The model

Complete the model below to include a vanilla RNN layer and a Linear output layer.

If you look at the documentation for the RNN class, you will find that inputs is a 3D tensor ([L,N,Hin] or (seq, batch, feature)), where feature is the size of the input feature (option input_size). Since we are using univariate data (i.e. one feature, namely the number of passengers, per time step), input_size=1. Since most PyTorch operations use batch_size as the leading dimension ((batch, seq, feature)), you should also set batch_size=True

The RNN output is a tensor tuple (output, h_n) corresponding to the output features of the last layer and the final hidden state. The output tensor has shape (batch, seq, output size) and the hidden state tensor has shape (num_layers, batch, output size). In the RNN constructor call, the output is equivalent to hidden_size.

In the model forward pass, you connect the final output to one or multiple Linear layers, whose final outcome is a passenger prediction for the next time point. The final output should therefore have output size 1.

Consult the documentation or lecture notes for more information.

Code
class AirlineRNN(...):
    def __init__(self, ...):
        # Add one RNN layer and 1-2 Linear layers
        # Parametrize hidden and output sizes in function call
        super().__init__()
        # Input tensor shape: (batch_size, time_steps, features)
        self.rnn = nn.RNN(
            input_size=...,
            hidden_size=...,
            num_layers=...,
            batch_first=...,
            nonlinearity=...
        )
        # Linear layers have dimension in_features, out_features
        self.fc1 = nn.Linear(...)
        # Possibly add more layers

    def forward(self, x):
        # Add forward step; connect layers and return output
        # RNN returns (output, h_n)
        rnn_out, h_n = self.rnn(...)
        rnn_out = rnn_out[:, -1, :] # Select final timestep
        x = self.fc1(...)
        # Additional Linear layer
        return ...

    def predict(self, x):
        # Add predict function; set model to evaluation mode and
        # returns model output for given input
        ...
Code
# solution_hidden: Vanilla RNN model
class AirlineRNN(nn.Module):
    def __init__(self, hidden_size=20, output_size=8):
        super().__init__()
        self.rnn = nn.RNN(
            input_size=1,
            hidden_size=hidden_size,
            num_layers=1,
            batch_first=True,
            nonlinearity="relu"
        )
        self.fc1 = nn.Linear(hidden_size, output_size)
        self.fc2 = nn.Linear(output_size, 1)

    def forward(self, x):
        rnn_out, h_n = self.rnn(x)
        # rnn_out shape: (batch_size, time_steps, hidden_size)
        # For single-layer RNN, take the last timestep output
        rnn_out = rnn_out[:, -1, :]  # Select final timestep
        x = self.fc1(rnn_out)
        return self.fc2(x)

    def predict(self, x):
        self.eval()
        with torch.no_grad():
            return self(x)

Once you are happy with the configuration, initialize the model:

Code
model = AirlineRNN(hidden_size=20)

Training

Now that we have the model components in place we can start training and testing our model. We make use of the utility functions defined in the preamble. In each training iteration, we let the model generate predictions and calculate the loss with respect to a loss function. The gradient of the loss with respect to the model parameters is then calculated with backward propagation, which is then passed on to and utilized by the optimizer to update weights that will hopefully improve network performance in the next epoch. To avoid overfitting, the loop adds a step that evaluates model performance on the test dataset.

Before training the model, we define dataloaders for train and test datasets.

Code
train_loader = DataLoader(train_data)
test_loader = DataLoader(test_data)

Next we choose an appropriate loss function and assign to variable criterion:

Code
criterion = nn...Loss()
Code
# solution_hidden: criterion
criterion = nn.MSELoss()

Finally, set the optimizer to SGD.

Code
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

With loss and optimizer in place, we can train the network.

Code
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using {device} device")
epochs = 400
liveplot = LivePlot()
liveplot.fig.update_layout(width=1200, height=800, font_size=18)
liveplot.increment(epochs)

train(model=model,
      train_loader=train_loader,
      dev_loader=test_loader,
      optimizer=optimizer,
      criterion=criterion,
      max_epochs=epochs,
      liveplot=liveplot,
      device=device)

We finish by defining a function to plot (rescaled) predictions and plot them.

Code
def plot_pred(train_data, test_data, model, figsize=(14, 8), rescale=True):
    """Plot model predictions for train and test data, possibly rescaling y-axis"""
    f = train_data.scaler.inverse_transform if rescale else lambda x: x
    fig, ax = plt.subplots(figsize=figsize)
    ax.plot(train_data.time, f(train_data.data), color="blue")
    ax.plot(test_data.time, f(test_data.data), color="orange")
    ax.plot(train_data.time[list(train_data.y_indices)],
            f(model.predict(train_data.X.to(device)).cpu()).flatten(),
            "-o", color="green")
    ax.plot(test_data.time[list(test_data.y_indices)],
            f(model.predict(test_data.X.to(device)).cpu()).flatten(),
            "-*", color="red")
    ax.set_title("Model prediction")
    plt.show()
Code
plot_pred(train_data, test_data, model)

Depending on how your model is defined, you will get different outcomes, but it is likely that training has been unsuccessful. Can you think of ways to improve network learning?

Code
# solution_hidden: Improving network learning
#
# optimizer: Stochastic Gradient Descent is generally a poor choice
# for RNNs for various reasons, including 1) vanishing gradient
# problem 2) no adaptive learning rates
#
# learning rate: learning rate selection is critical for RNNs, where
# too high a rate may produce oscillating behaviour or explosion of
# loss
#

Improve training performance

Revise the previous code blocks and see if you can improve training performance.

Code
# solution_hidden: Improved training
model = AirlineRNN()
train_loader = DataLoader(train_data)
test_loader = DataLoader(test_data)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using {device} device")
epochs = 400
liveplot = LivePlot()
liveplot.fig.update_layout(width=1200, height=800, font_size=18)
liveplot.increment(epochs)

train(model=model,
      train_loader=train_loader,
      dev_loader=test_loader,
      optimizer=optimizer,
      criterion=criterion,
      max_epochs=epochs,
      liveplot=liveplot,
      device=device)
Code
plot_pred(train_data, test_data, model)

Digression: RNN weights and network topology

In order to get a better understanding of the network topology and weights, here we go through how to extract the relevant parameters from the model. The model consists of several layers that we can inspect by printing the model (the numbers are based on the suggested solution Vanilla RNN model above):

Code
model

The first RNN layer has input_size=1 (one feature) and hidden_size=20, the number of features in the hidden state.

We can view parameters either with the parameters() function or named_parameters() that also provides the parameter name, either for the model:

Code
print('\n\nModel params:')
for name, param in model.named_parameters():
    print(name, param.shape)

or just one layer:

Code
print('\n\nRNN layer params:')
for name, param in model.rnn.named_parameters():
    print(name, param.shape)

Each input with one feature is multiplied by 20 weights (\(W_{ih}\), one for each RNN layer. The hidden state consists of 20 features and is processed by a (20, 20) size weight matrix \(W_{hh}\). The final output vector \(y\) is generated by separate layer(s), here model.fc1 and model.fc2, where the latter outputs the predicted next time point.

Session 2: LSTM (and optionally GRU)

Building on session 1, analyse the data set using LSTM layers. Here is a tentative model setup to get you started. Here you could try using multiple layers, in which case you need to return the sequences for all but the last layer (cf Stacked Long Short-Term Memory Networks). If you have time, you can also try out the GRU layers for comparison. You may also want to increase the size of the hidden layers.

Code
class AirlineLSTM(...):
    def __init__(...):
        # Add LSTM layer(s) and 1-2 Linear layers
Code
# solution_hidden: Simple LSTM model
class AirlineLSTM(nn.Module):
    def __init__(self, hidden_size=20, output_size=8, num_layers=1):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=1,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
        )
        self.fc1 = nn.Linear(hidden_size, output_size)
        self.fc2 = nn.Linear(output_size, 1)

    def forward(self, x):
        lstm_out, hidden = self.lstm(x)
        # lstm_out shape: (batch_size, time_steps, hidden_size)
        # For multiple-layer LSTM, take the last timestep output
        lstm_out = lstm_out[:, -1, :]  # Select final timestep
        x = self.fc1(lstm_out)
        return self.fc2(x)

    def predict(self, x):
        self.eval()
        with torch.no_grad():
            return self(x)

Once you’re satisfied, initialize the model, choose loss function and optimizer and train.

Code
model = AirlineLSTM(hidden_size=20, num_layers=1)
Code
criterion = ...
Code
# solution_hidden: criterion
criterion = nn.MSELoss()
Code
optimizer = ...
Code
# solution_hidden: Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Code
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using {device} device")
epochs = 100
liveplot = LivePlot()
liveplot.fig.update_layout(width=1200, height=800, font_size=18)
liveplot.increment(epochs)

train(model=model,
      train_loader=train_loader,
      dev_loader=test_loader,
      optimizer=optimizer,
      criterion=criterion,
      max_epochs=epochs,
      liveplot=liveplot,
      device=device)
Code
plot_pred(train_data, test_data, model)

Recall that the default setting for time steps is 12. If you increase this number, you would look further back in time, which in principle should play to the strength of LSTMs. Note that this need not necessarily lead to better predictions.

Conclusions

As you may have noticed, it can be quite tricky to tune the model to make good predictions. The small training data size doesn’t help either; a simple univariate model would perform just as well in this case. Still, this simple example has hopefully provided some insight into time series prediction with recurrent neural networks.

Resources

The airplanes dataset is a commonly used example to introduce time series forecasting with Recurrent Neural Networks, and there are numerous blog posts on the topic. There is also a library called Darts that is dedicated to time series analyses in Python. On the Darts website you can find a worked example that highlights some additional aspects of time series forecasting, such as making use of the known covariates month and year.