Multi-layer perceptron activity

In this activity, you will make a multi-layer perceptron (MLP) model in the PyTorch deep learning package to perform classification of hand-written digits in the classic MNIST dataset.

There are 5 tasks for you to complete the example. Cells have clearly marked # TODO and ##### comments for you to insert your code between. Variables assigned to None should keep the same name but assigned to their proper implementation.

Complete the implementation of the MLP model by completing the __init__ and forward functions.
Setup the optimizer and loss function for training.
Fill in the missing steps in the train and test loop.
Train the model for 5 epochs.
Visualize the model predictions on some examples.

# TODO: Run this cell to import relevant packages

import torch  # Main torch import for torch tensors (arrays)
import torch.nn as nn  # Neural network module for building deep learning models
import torch.nn.functional as F  # Functional module, includes activation functions
import torch.optim as optim  # Optimization module
import torchvision  # Vision / image processing package built on top of torch

from matplotlib import pyplot as plt  # Plotting and visualization
from sklearn.metrics import accuracy_score  # Computing accuracy metric

# TODO: Run this cell to download the data and setup the pre-processing pipeline

# Common practice to normalize input data to neural networks (0 mean, unit variance)
transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),  # All inputs to PyTorch neural networks must be torch.Tensor
    torchvision.transforms.Normalize(mean=0.1307, std=0.3081)  # Subtracts mean and divides by std. Note that the raw data is between [0, 1]
])

# Download the MNIST data and lazily apply the transformation pipeline
train_data = torchvision.datasets.MNIST('./datafiles/', train=True, download=True, transform=transform)
test_data = torchvision.datasets.MNIST('./datafiles/', train=False, download=True, transform=transform)

# Setup data loaders
# Note: Iterating through the dataloader yields batches of (inputs, targets)
# where inputs is a torch.Tensor of shape (batch, 28, 28) and targets is a torch.Tensor of shape (batch,)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=1000)

# TODO: Run this cell to visualize 20 examples from the test dataset

fig, axs = plt.subplots(4, 5, figsize=(5, 6))

plot_images = []
plot_labels = []

for i, ax in enumerate(axs.flatten(), start=1000):
    (image, label) = test_data[i]

    # Save this data for later
    plot_images.append(image)
    plot_labels.append(label)

    # Plot each image
    ax.imshow(image.squeeze(), cmap="viridis")
    ax.set_title(f"Label: {label}")
    ax.axis("off")
plt.show()

plot_images = torch.cat(plot_images)  # Combine all the images into a single batch for later

print(f"Each image is a torch.Tensor and has shape {image.shape}.")
print(f"The labels are the integers 0 to 9, representing the digits.")

png

Each image is a torch.Tensor and has shape torch.Size([1, 28, 28]).
The labels are the integers 0 to 9, representing the digits.

1. Complete the implementation of the MLP model

Although we draw diagrams of hidden layers as neurons with incoming and outgoing connections, in practice, we implement this with two linear layers (also called “dense layers”) and a pointwise non-linearity in between. The first layer is a linear transform (matrix multiplication) from the input dimension to the hidden dimension. The second layer is a linear transform from the hidden dimension to the output dimension.

For this model, the input dimension is (28*28) as we will flatten the 2D images into a 1D vector. The hidden dimension is 100. The output dimension is 10, since we have 10 classes. We will use the ReLU non-linearity.

In PyTorch, a model is defined by subclassing the nn.Module class and we define behaviour in two methods. In the __init__ method, we setup the model architecture such as number layers, the size of each layer, etc. In the forward() method, we define the operations performed by the model’s layers on the input data to produce outputs.

Note: You do not need to apply a softmax to the outputs as this is automatically done with the appropriate loss function.

Relevant documentation:

class MultiLayerPerceptron(nn.Module):
    def __init__(self):
        super().__init__()
        # TODO: Assign self.hidden to a torch linear layer of the correct size
        self.hidden = nn.Linear(28 * 28, 100)
        # TODO: Assign self.output to a torch linear layer of the correct size
        self.output = nn.Linear(100, 10)
        #####

    def forward(self, x):
        """
        Forward pass implementation for the network

        :param x: torch.Tensor of shape (batch, 28, 28), input images

        :returns: torch.Tensor of shape (batch, 10), output logits
        """
        x = torch.flatten(x, 1)  # shape (batch, 28*28)
        # TODO: Process x through self.hidden, relu, and self.output and return the result
        x = self.hidden(x)
        x = F.relu(x)
        x = self.output(x)
        return x
        #####

# TODO: Run this cell to test your implementation. You should expect an output tensor of shape (2, 10)

mlp = MultiLayerPerceptron()
x = torch.randn(2, 28, 28)
z = mlp(x)
z.shape

torch.Size([2, 10])

2. Setup optimizer and loss function

The current standard optimizer in deep learning is the Adam optimizer. Use a learning rate of $1\times 10^{-2}$.

The task we are performing is multiclass classification (10 independent classes, one for each digit). The loss function to use for this task is cross entropy loss.

Relevant documentation:

# TODO: Instantiate your model and setup the optimizer

model = MultiLayerPerceptron()
optimizer = optim.Adam(model.parameters(), lr=1e-2)

# TODO: Setup the cross entropy loss function

loss_fn = nn.CrossEntropyLoss()

3. Fill in the missing steps of the train and test loop

During the training loop, we perform the following steps:

Fetch the next batch of inputs and targets from the dataloader
Zero the parameter gradients
Compute the model output predictions from the inputs
Compute the loss between the model outputs and the targets
Compute the parameter gradients with backpropagation
Perform a gradient descent step with the optimizer to update the model parameters

Relevant documentation:

PyTorch optimization step

def train(model, train_loader, loss_fn, optimizer, epoch=-1):
    """
    Trains a model for one epoch (one pass through the entire training data).

    :param model: PyTorch model
    :param train_loader: PyTorch Dataloader for training data
    :param loss_fn: PyTorch loss function
    :param optimizer: PyTorch optimizer, initialized with model parameters
    :kwarg epoch: Integer epoch to use when printing loss and accuracy
    """
    total_loss = 0
    all_predictions = []
    all_targets = []

    model.train()  # Set model in training mode
    for i, (inputs, targets) in enumerate(train_loader):  # 1. Fetch next batch of data
        # TODO: Fill in the rest of the training loop
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = loss_fn(outputs, targets)
        loss.backward()
        optimizer.step()

        # 2. Zero parameter gradients
        #outputs = None  # 3. Compute model outputs
        #loss =  # 4. Compute loss between outputs and targets
        #loss.backward()  # 5. Backpropagation for parameter gradients
        # 6. Gradient descent step
        #####

        # Track some values to compute statistics
        total_loss += loss.item()
        preds = torch.argmax(
            outputs, dim=-1
        )  # Take the class with the highest output as the prediction
        all_predictions.extend(preds.tolist())
        all_targets.extend(targets.tolist())

        # Print some statistics every 100 batches
        if i % 100 == 0:
            running_loss = total_loss / (i + 1)
            print(f"Epoch {epoch + 1}, batch {i + 1}: loss = {running_loss:.2f}")

    # TODO: Compute the overall accuracy
    acc = None
    #####

    # Print average loss and accuracy
    print(
        f"Epoch {epoch + 1} done. Average train loss = {total_loss / len(train_loader):.2f}, average train accuracy = {acc * 100:.3f}%"
    )

def test(model, test_loader, loss_fn, epoch=-1):
    """
    Tests a model for one epoch of test data.

    Note:
        In testing and evaluation, we do not perform gradient descent optimization, so steps 2, 5, and 6 are not needed.
        For performance, we also tell torch not to track gradients by using the `with torch.no_grad()` context.

    :param model: PyTorch model
    :param test_loader: PyTorch Dataloader for test data
    :param loss_fn: PyTorch loss function
    :kwarg epoch: Integer epoch to use when printing loss and accuracy
    """
    total_loss = 0
    all_predictions = []
    all_targets = []

    model.eval()  # Set model in evaluation mode
    for i, (inputs, targets) in enumerate(test_loader):  # 1. Fetch next batch of data
        with torch.no_grad():
            # TODO: Compute the model outputs and loss only. Do not update using the optimizer
            outputs = None  # 3. Compute model outputs
            loss = None  # 4. Compute loss between outputs and targets
            #####

            # Track some values to compute statistics
            total_loss += loss.item()
            preds = torch.argmax(outputs, dim=-1)  # Take the class with the highest output as the prediction
            all_predictions.extend(preds.tolist())
            all_targets.extend(targets.tolist())

    # TODO: Compute the overall accuracy        
    acc = None
    #####

    # Print average loss and accuracy
  #  print(f"Epoch {epoch + 1} done. Average test loss = {total_loss / len(test_loader):.2f}, average test accuracy = {acc * 100:.3f}%")

4. Train the model for 5 epochs

# TODO: Copy the setup for the model, optimizer, and loss function from Section 2 to here
# Then, run this cell to train the model for 5 epochs
model = MultiLayerPerceptron()
optimizer = optim.Adam(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()
#####

for epoch in range(5):
    # TODO: Fill in the rest of the arguments to the train and test functions
    train(
        model,
        loss_fn=loss_fn,
        optimizer=optimizer,
        train_loader=train_loader,
        epoch=epoch,
    )
    test(
        model,
        loss_fn=loss_fn,
        epoch=epoch,
    )
    #####

Epoch 1, batch 1: loss = 2.32
Epoch 1, batch 101: loss = 0.59
Epoch 1, batch 201: loss = 0.44
Epoch 1, batch 301: loss = 0.38
Epoch 1, batch 401: loss = 0.35
Epoch 1, batch 501: loss = 0.32
Epoch 1, batch 601: loss = 0.31
Epoch 1, batch 701: loss = 0.30
Epoch 1, batch 801: loss = 0.29
Epoch 1, batch 901: loss = 0.28



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb Cell 15 in <cell line: 8>()
      <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=5'>6</a> #####
      <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=7'>8</a> for epoch in range(5):
      <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=8'>9</a>     # TODO: Fill in the rest of the arguments to the train and test functions
---> <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=9'>10</a>     train(
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=10'>11</a>         model,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=11'>12</a>         loss_fn=loss_fn,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=12'>13</a>         optimizer=optimizer,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=13'>14</a>         train_loader=train_loader,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=14'>15</a>         epoch=epoch,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=15'>16</a>     )
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=16'>17</a>     test(
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=17'>18</a>         model,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=18'>19</a>         loss_fn=loss_fn,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=19'>20</a>         epoch=epoch,
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=20'>21</a>     )


/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb Cell 15 in train(model, train_loader, loss_fn, optimizer, epoch)
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=44'>45</a> acc = None
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=45'>46</a> #####
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=46'>47</a> 
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=47'>48</a> # Print average loss and accuracy
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=48'>49</a> print(
---> <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=49'>50</a>     f"Epoch {epoch + 1} done. Average train loss = {total_loss / len(train_loader):.2f}, average train accuracy = {acc * 100:.3f}%"
     <a href='vscode-notebook-cell:/Users/sammyrobens-paradise/projects/computational-neuroscience/tutorials/mlp_tutorial.ipynb#X20sZmlsZQ%3D%3D?line=50'>51</a> )


TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

5. Visually compare the model predictions

We will lastly see the trained model’s predictions on the 20 examples we visualized in the beginning.

# TODO: Run this cell to visualize the data

# Evaluate the model on the plot_images
model.eval()

with torch.no_grad():
    plot_outputs = model(plot_images)
    plot_preds = torch.argmax(plot_outputs, dim=-1)

# Plot and show the labels
fig, axs = plt.subplots(4, 5, figsize=(7, 8))

for i, ax in enumerate(axs.flatten()):
    image = plot_images[i]
    label = plot_labels[i]
    pred = plot_preds[i]

    ax.imshow(image.squeeze(), cmap="viridis")
    ax.set_title(f"Prediction: {pred}\nLabel: {label}")
    ax.axis("off")
plt.show()