Вторник, 9 Июля 2024
Придумал интересную идею по интерфейсу без клавиатуры. Писать буквы пальцем на руке, а нейросеть будет распознавать и делать слова и предложения. Можно таким образом интересную систему ввода сделать, и довольно секьюрную, лучше чем речь. А можно обе совместить, речь + рисование букв на ладони. Речь для несекьюрного, ладонь для секьюрного или когда хочется приватной беседы
Прошел третью часть курса по пайторчу. Вот ноутбук
Pytorch Computer Vision
0. Computer vision libraries in PyTorch
torchvision
- base domain library for PyTorch computer visiontorchvision.datasets
- get datasets and data loading functions for computer visiontorchvision.models
- get pretrained computer vision models that you can leverage for your own problemstorchvision.transforms
- functions for manipulating your vision data (images) to be suitable for use with an ML modeltorch.utils.data.Dataset
- Base dataset class for PyTorchtorch.utils.data.DataLoader
- Creater a Python iterable over a dataset
import torch
from torch import nn
import torchvision
from torchvision import datasets
from torchvision import transforms
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
print(torch.__version__)
print(torchvision.__version__)
2.1.2
0.16.2
1. Getting a dataset
The dataset we'll be using is FashionMNIST from torchvision.datasets
# Setup training data
= datasets.FashionMNIST(
train_data ="data", # where to download data to
root=True, # do we want training dataset
train=True, # do we want to download data to computer
download=ToTensor(), # how do we want to transform data
transform=None # how do we want to transform labels
target_transform
)
= datasets.FashionMNIST(
test_data ="data",
root=False,
train=True,
download=ToTensor(),
transform=None
target_transform )
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 26421880/26421880 [00:08<00:00, 3261010.08it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 29515/29515 [00:00<00:00, 268531.46it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 4422102/4422102 [00:00<00:00, 5086863.45it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 5148/5148 [00:00<00:00, 9507827.83it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw
len(train_data), len(test_data)
(60000, 10000)
# See the first training example
= train_data[0]
image, label image, label
(tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0039, 0.0000, 0.0000, 0.0510,
0.2863, 0.0000, 0.0000, 0.0039, 0.0157, 0.0000, 0.0000, 0.0000,
0.0000, 0.0039, 0.0039, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0118, 0.0000, 0.1412, 0.5333,
0.4980, 0.2431, 0.2118, 0.0000, 0.0000, 0.0000, 0.0039, 0.0118,
0.0157, 0.0000, 0.0000, 0.0118],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0235, 0.0000, 0.4000, 0.8000,
0.6902, 0.5255, 0.5647, 0.4824, 0.0902, 0.0000, 0.0000, 0.0000,
0.0000, 0.0471, 0.0392, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.6078, 0.9255,
0.8118, 0.6980, 0.4196, 0.6118, 0.6314, 0.4275, 0.2510, 0.0902,
0.3020, 0.5098, 0.2824, 0.0588],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0039, 0.0000, 0.2706, 0.8118, 0.8745,
0.8549, 0.8471, 0.8471, 0.6392, 0.4980, 0.4745, 0.4784, 0.5725,
0.5529, 0.3451, 0.6745, 0.2588],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0039, 0.0039, 0.0039, 0.0000, 0.7843, 0.9098, 0.9098,
0.9137, 0.8980, 0.8745, 0.8745, 0.8431, 0.8353, 0.6431, 0.4980,
0.4824, 0.7686, 0.8980, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.7176, 0.8824, 0.8471,
0.8745, 0.8941, 0.9216, 0.8902, 0.8784, 0.8706, 0.8784, 0.8667,
0.8745, 0.9608, 0.6784, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.7569, 0.8941, 0.8549,
0.8353, 0.7765, 0.7059, 0.8314, 0.8235, 0.8275, 0.8353, 0.8745,
0.8627, 0.9529, 0.7922, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0039, 0.0118, 0.0000, 0.0471, 0.8588, 0.8627, 0.8314,
0.8549, 0.7529, 0.6627, 0.8902, 0.8157, 0.8549, 0.8784, 0.8314,
0.8863, 0.7725, 0.8196, 0.2039],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0235, 0.0000, 0.3882, 0.9569, 0.8706, 0.8627,
0.8549, 0.7961, 0.7765, 0.8667, 0.8431, 0.8353, 0.8706, 0.8627,
0.9608, 0.4667, 0.6549, 0.2196],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0157, 0.0000, 0.0000, 0.2157, 0.9255, 0.8941, 0.9020,
0.8941, 0.9412, 0.9098, 0.8353, 0.8549, 0.8745, 0.9176, 0.8510,
0.8510, 0.8196, 0.3608, 0.0000],
[0.0000, 0.0000, 0.0039, 0.0157, 0.0235, 0.0275, 0.0078, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.9294, 0.8863, 0.8510, 0.8745,
0.8706, 0.8588, 0.8706, 0.8667, 0.8471, 0.8745, 0.8980, 0.8431,
0.8549, 1.0000, 0.3020, 0.0000],
[0.0000, 0.0118, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.2431, 0.5686, 0.8000, 0.8941, 0.8118, 0.8353, 0.8667,
0.8549, 0.8157, 0.8275, 0.8549, 0.8784, 0.8745, 0.8588, 0.8431,
0.8784, 0.9569, 0.6235, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0706, 0.1725, 0.3216, 0.4196,
0.7412, 0.8941, 0.8627, 0.8706, 0.8510, 0.8863, 0.7843, 0.8039,
0.8275, 0.9020, 0.8784, 0.9176, 0.6902, 0.7373, 0.9804, 0.9725,
0.9137, 0.9333, 0.8431, 0.0000],
[0.0000, 0.2235, 0.7333, 0.8157, 0.8784, 0.8667, 0.8784, 0.8157,
0.8000, 0.8392, 0.8157, 0.8196, 0.7843, 0.6235, 0.9608, 0.7569,
0.8078, 0.8745, 1.0000, 1.0000, 0.8667, 0.9176, 0.8667, 0.8275,
0.8627, 0.9098, 0.9647, 0.0000],
[0.0118, 0.7922, 0.8941, 0.8784, 0.8667, 0.8275, 0.8275, 0.8392,
0.8039, 0.8039, 0.8039, 0.8627, 0.9412, 0.3137, 0.5882, 1.0000,
0.8980, 0.8667, 0.7373, 0.6039, 0.7490, 0.8235, 0.8000, 0.8196,
0.8706, 0.8941, 0.8824, 0.0000],
[0.3843, 0.9137, 0.7765, 0.8235, 0.8706, 0.8980, 0.8980, 0.9176,
0.9765, 0.8627, 0.7608, 0.8431, 0.8510, 0.9451, 0.2549, 0.2863,
0.4157, 0.4588, 0.6588, 0.8588, 0.8667, 0.8431, 0.8510, 0.8745,
0.8745, 0.8784, 0.8980, 0.1137],
[0.2941, 0.8000, 0.8314, 0.8000, 0.7569, 0.8039, 0.8275, 0.8824,
0.8471, 0.7255, 0.7725, 0.8078, 0.7765, 0.8353, 0.9412, 0.7647,
0.8902, 0.9608, 0.9373, 0.8745, 0.8549, 0.8314, 0.8196, 0.8706,
0.8627, 0.8667, 0.9020, 0.2627],
[0.1882, 0.7961, 0.7176, 0.7608, 0.8353, 0.7725, 0.7255, 0.7451,
0.7608, 0.7529, 0.7922, 0.8392, 0.8588, 0.8667, 0.8627, 0.9255,
0.8824, 0.8471, 0.7804, 0.8078, 0.7294, 0.7098, 0.6941, 0.6745,
0.7098, 0.8039, 0.8078, 0.4510],
[0.0000, 0.4784, 0.8588, 0.7569, 0.7020, 0.6706, 0.7176, 0.7686,
0.8000, 0.8235, 0.8353, 0.8118, 0.8275, 0.8235, 0.7843, 0.7686,
0.7608, 0.7490, 0.7647, 0.7490, 0.7765, 0.7529, 0.6902, 0.6118,
0.6549, 0.6941, 0.8235, 0.3608],
[0.0000, 0.0000, 0.2902, 0.7412, 0.8314, 0.7490, 0.6863, 0.6745,
0.6863, 0.7098, 0.7255, 0.7373, 0.7412, 0.7373, 0.7569, 0.7765,
0.8000, 0.8196, 0.8235, 0.8235, 0.8275, 0.7373, 0.7373, 0.7608,
0.7529, 0.8471, 0.6667, 0.0000],
[0.0078, 0.0000, 0.0000, 0.0000, 0.2588, 0.7843, 0.8706, 0.9294,
0.9373, 0.9490, 0.9647, 0.9529, 0.9569, 0.8667, 0.8627, 0.7569,
0.7490, 0.7020, 0.7137, 0.7137, 0.7098, 0.6902, 0.6510, 0.6588,
0.3882, 0.2275, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1569,
0.2392, 0.1725, 0.2824, 0.1608, 0.1373, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000]]]),
9)
= train_data.classes
class_names class_names
['T-shirt/top',
'Trouser',
'Pullover',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle boot']
= train_data.class_to_idx
class_to_idx class_to_idx
{'T-shirt/top': 0,
'Trouser': 1,
'Pullover': 2,
'Dress': 3,
'Coat': 4,
'Sandal': 5,
'Shirt': 6,
'Sneaker': 7,
'Bag': 8,
'Ankle boot': 9}
# Check the shape
print(f"image shape: {image.shape} -> [color_channels, height, width], label: {class_names[label]}")
image shape: torch.Size([1, 28, 28]) -> [color_channels, height, width], label: Ankle boot
1.2 Visualizing our data
import matplotlib.pyplot as plt
= train_data[0]
image, label print(f"Image shape: {image.shape}")
plt.imshow(image.squeeze()) plt.title(label)
Image shape: torch.Size([1, 28, 28])
Text(0.5, 1.0, '9')
="gray")
plt.imshow(image.squeeze(), cmap
plt.title(class_names[label])False) plt.axis(
(-0.5, 27.5, 27.5, -0.5)
# Plot more images
42)
torch.manual_seed(
=plt.figure(figsize=(9,9))
fig= 4, 4
rows, cols for i in range(1, rows*cols+1):
= torch.randint(0, len(train_data), size=[1]).item()
random_idx = train_data[random_idx]
img, label
fig.add_subplot(rows, cols, i)="gray")
plt.imshow(img.squeeze(), cmap
plt.title(class_names[label])False) plt.axis(
Do you think these items of closing (images) could be modelled with pure linear lines? Or do you think we'll need non-linearities?
2. Prepare DataLoader
Right now, our data is in the form of PyTorch Datasets.
DataLoader turns our dataset into a Python iterable
More specifically, we want to turn data into batches (or mini-batches).
Why would we do this?
- It is more computationally efficient, as in, your computer hardware may not be able to look (store in memory) at 60000 images in one hit. So we break it down to 32 images at a time (batch size of 32)
- It gives our neural network more chances to update its gradients per epoch.
train_data, test_data
(Dataset FashionMNIST
Number of datapoints: 60000
Root location: data
Split: Train
StandardTransform
Transform: ToTensor(),
Dataset FashionMNIST
Number of datapoints: 10000
Root location: data
Split: Test
StandardTransform
Transform: ToTensor())
from torch.utils.data import DataLoader
# Setup the batch size hyperparameter
= 32
BATCH_SIZE
# Turn datasets into iterables (batches)
= DataLoader(
train_dataloader =train_data,
dataset=BATCH_SIZE,
batch_size=True
shuffle
)
= DataLoader(
test_dataloader =test_data,
dataset=BATCH_SIZE,
batch_size=False
shuffle
)
train_dataloader, test_dataloader
(<torch.utils.data.dataloader.DataLoader at 0x7e1fa818a5c0>,
<torch.utils.data.dataloader.DataLoader at 0x7e1fa818b0d0>)
# Let's check out what we've created
print(f"DataLoaders: {train_dataloader, test_dataloader}")
print(f"Length of train_dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test_dataloader: {len(test_dataloader)} of batch size {BATCH_SIZE}")
DataLoaders: (<torch.utils.data.dataloader.DataLoader object at 0x7e1fa818a5c0>, <torch.utils.data.dataloader.DataLoader object at 0x7e1fa818b0d0>)
Length of train_dataloader: 1875 batches of 32
Length of test_dataloader: 313 of batch size 32
= next(iter(train_dataloader))
train_features_batch, train_labels_batch train_features_batch.shape, train_labels_batch.shape
(torch.Size([32, 1, 28, 28]), torch.Size([32]))
# Show a sample
42)
torch.manual_seed(= torch.randint(0, len(train_features_batch), size=[1]).item()
random_idx = train_features_batch[random_idx], train_labels_batch[random_idx]
img, label
="gray")
plt.imshow(img.squeeze(), cmap
plt.title(class_names[label])False)
plt.axis(print(f"Image size: {img.shape}")
print(f"Label: {label}, label size: {label.shape}")
Image size: torch.Size([1, 28, 28])
Label: 6, label size: torch.Size([])
3. Model 0. Build a baseline model
When starting to build a series of machine learning experiments, it's best practise to start with a baseline model
A baseline model is a simple model you will try and improve upon with subsequent model/experiments.
In other words: start simply and add complexity when necessary
# Create a flatten layer
= nn.Flatten()
flatten_model
# Get a single sample
= train_features_batch[0]
x
# Flatten the sample
= flatten_model(x) # perform forward pass
output
# Print out what happened
print(f"Shape before flattening: {x.shape} -> [color_channels, height, width]")
print(f"Shape after flattening: {output.shape} -> [color_channels, height*width]")
Shape before flattening: torch.Size([1, 28, 28]) -> [color_channels, height, width]
Shape after flattening: torch.Size([1, 784]) -> [color_channels, height*width]
from torch import nn
class FashionMNISTModelV0(nn.Module):
def __init__(
self,
int,
input_shape: int,
hidden_units: int
output_shape:
):super().__init__()
self.layer_stack = nn.Sequential(
nn.Flatten(),=input_shape, out_features=hidden_units),
nn.Linear(in_features=hidden_units, out_features=output_shape)
nn.Linear(in_features
)
def forward(self, x):
return self.layer_stack(x)
42)
torch.manual_seed(
# Setup model with input parameters
= FashionMNISTModelV0(
model_0 =784,
input_shape=10,
hidden_units=len(class_names)
output_shape"cpu")
).to(
model_0
FashionMNISTModelV0(
(layer_stack): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=784, out_features=10, bias=True)
(2): Linear(in_features=10, out_features=10, bias=True)
)
)
= torch.rand([1, 1, 28, 28])
dummy_x model_0(dummy_x)
tensor([[-0.0315, 0.3171, 0.0531, -0.2525, 0.5959, 0.2112, 0.3233, 0.2694,
-0.1004, 0.0157]], grad_fn=<AddmmBackward0>)
3.1 Setup loss, otimizer and evaluation metrics
- Loss function - since we're working with multi-class data, our loss
function will be
nn.CrossEntropyLoss()
- Optimizer - our optimizer
torch.optim.SGD()
(stochastic gradient descent) - Evaluation metric - since we're working on a classification problem, let's use accuracy as our evaluation metric
import requests
from pathlib import Path
# Download helper functions from Learn Pytorch repo
if Path("helper_functions.py").is_file():
print("already exists")
else:
print("downloading file")
= requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
request with open("helper_functions.py", "wb") as f:
f.write(request.content)
downloading file
# Import accuracy metric
from helper_functions import accuracy_fn
# Setup loss function and optimizer
= nn.CrossEntropyLoss()
loss_fn = torch.optim.SGD(params=model_0.parameters(), lr=0.1) optimizer
3.2 Create a function to time our experiments
Machine learning is very experimental.
Two of the main things you'll often want to track are:
- Model's performance (loss and accuracy values etc)
- How fast it runs
from timeit import default_timer as timer
def print_train_time(start: float, end: float, device: torch.device = None):
"""
Prints difference between start and end time
"""
= end - start
total_time print(f"Train time on {device}: {total_time:.3f} seconds")
return total_time
= timer()
start_time = timer()
end_time =start_time, end=end_time, device="cpu") print_train_time(start
Train time on cpu: 0.000 seconds
2.6355000045441557e-05
3.3 Creating a training loop and training a model on batches of data
- Loop through epochs.
- Loop through training batches, perform training steps, calculate the training loss per batch.
- Loop through testing batches, perform testing steps, claculate the test loss per batch
- Print out what's happening
- Time it all (for fun)
# Import tqdm for progress bar
from tqdm.auto import tqdm
# Set the seed and start the timer
42)
torch.manual_seed(= timer()
train_time_start_on_cpu
# Set the number of epochs (we'll keep it small for faster training loop)
= 3
epochs
# Create training and test loop
for epoch in tqdm(range(epochs)):
print(f"Epoch: {epoch}\n------")
# Training
= 0
train_loss
# Add a loop to loop through the training batches
for batch, (X, y) in enumerate(train_dataloader):
model_0.train()# 1. Forward pass
= model_0(X)
y_pred
# 2. Calculate the loss
= loss_fn(y_pred, y)
loss += loss # accumulate train loss
train_loss
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss bachward
loss.backward()
# 5. Optimizer step
optimizer.step()
# Print out what's happening
if batch % 400 == 0:
print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} examples")
# Devide total train loss by length of train dataloader
/= len(train_dataloader)
train_loss
### Testing
= 0, 0
test_loss, test_acc eval()
model_0.with torch.inference_mode():
for X_test, y_test in test_dataloader:
# 1. froward pass
= model_0(X_test)
test_pred
# 2. Calculate the loss (accumulatevly)
+= loss_fn(test_pred, y_test)
test_loss
# 3. Calculate accuracy
+= accuracy_fn(y_true=y_test, y_pred=test_pred.argmax(dim=1))
test_acc
# Calculate the test loss average per batch
/= len(test_dataloader)
test_loss
# Calculate the test acc average per batch
/= len(test_dataloader)
test_acc
# Print out what's happening
print(f"\nTrain loss: {train_loss:.4f} | test loss: {test_loss:.4f} | test accuracy: {test_acc:.2f}")
# Calculate train time
= timer()
train_time_end_on_cpu = print_train_time(
total_train_time_model_0 =train_time_start_on_cpu,
start=train_time_end_on_cpu,
end=str(next(model_0.parameters()).device)
device )
{"model_id":"975bdc5c00264619829c79525eaa6d78","version_major":2,"version_minor":0}
Epoch: 0
------
Looked at 0/60000 examples
Looked at 12800/60000 examples
Looked at 25600/60000 examples
Looked at 38400/60000 examples
Looked at 51200/60000 examples
Train loss: 0.5904 | test loss: 0.5095 | test accuracy: 82.04
Epoch: 1
Looked at 0/60000 examples
Looked at 12800/60000 examples
Looked at 25600/60000 examples
Looked at 38400/60000 examples
Looked at 51200/60000 examples
Train loss: 0.4763 | test loss: 0.4799 | test accuracy: 83.20
Epoch: 2
Looked at 0/60000 examples
Looked at 12800/60000 examples
Looked at 25600/60000 examples
Looked at 38400/60000 examples
Looked at 51200/60000 examples
Train loss: 0.4550 | test loss: 0.4766 | test accuracy: 83.43
Train time on cpu: 29.723 seconds
4. Make predictions and get Model 0 results
42)
torch.manual_seed(def eval_model(model: torch.nn.Module,
data_loader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
accuracy_fn,= "cpu"):
device """
Returns a dictionary containing the results of model predicting on data_loader
"""
= 0, 0
loss, acc eval()
model.with torch.inference_mode():
for X, y in tqdm(data_loader):
# Make our data device agnostic
= X.to(device), y.to(device)
X, y # Make predictions
= model(X)
y_pred
# Accumulate the loss and acc values per batch
+= loss_fn(y_pred, y)
loss += accuracy_fn(y_true=y,
acc =y_pred.argmax(dim=1))
y_pred
#Scale loss and acc to find the average loss/acc per batch
/= len(data_loader)
loss /= len(data_loader)
acc
return {"model_name": model.__class__.__name__,
"model_loss": loss.item(),
"model_acc": acc}
# Calculate model 0 results on test dataset
= eval_model(model=model_0,
model_0_results =test_dataloader,
data_loader=loss_fn,
loss_fn=accuracy_fn,
accuracy_fn="cpu")
device model_0_results
{"model_id":"edb7f60a1161425eb01c5c6d8b1c9da7","version_major":2,"version_minor":0}
{'model_name': 'FashionMNISTModelV0',
'model_loss': 0.47663894295692444,
'model_acc': 83.42651757188499}
5. Setup device agnostic-code (for using a GPU if there is one)
= "cuda" if torch.cuda.is_available() else "cpu"
device device
'cuda'
6. Model 1: Building a better model with non-linearity
We learned about the power of non-linearity
# Create a model with linear and non-linear layers
class FashionMNISTModelV1(nn.Module):
def __init__(self,
int,
input_shape: int,
hidden_units: int):
output_shape: super().__init__()
self.layer_stack = nn.Sequential(
# Flatten input into a single vector
nn.Flatten(), =input_shape, out_features=hidden_units),
nn.Linear(in_features
nn.ReLU(),=hidden_units, out_features=hidden_units),
nn.Linear(in_features
nn.ReLU(),=hidden_units, out_features=output_shape)
nn.Linear(in_features
)
def forward(self, x):
return self.layer_stack(x)
device
'cuda'
!nvidia-smi
Tue Jul 9 15:28:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 48C P8 12W / 70W | 3MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 51C P8 11W / 70W | 3MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
# Create an instance of model
= FashionMNISTModelV1(input_shape=28*28,
model_1 =30,
hidden_units=len(class_names)).to(device)
output_shape
model_1
FashionMNISTModelV1(
(layer_stack): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=784, out_features=30, bias=True)
(2): ReLU()
(3): Linear(in_features=30, out_features=30, bias=True)
(4): ReLU()
(5): Linear(in_features=30, out_features=10, bias=True)
)
)
next(model_1.parameters()).device
device(type='cuda', index=0)
next(model_0.parameters()).device
device(type='cpu')
6.1 Setup loss, optimizer and evaluation metrics
from helper_functions import accuracy_fn
= nn.CrossEntropyLoss()
loss_fn = torch.optim.SGD(params=model_1.parameters(), lr=0.1) optimizer
6.2 Functionazing training and evaluation/testing loops
Let's create a function for:
- training loop -
train_step()
- testing loop -
test_step()
def train_step(model: torch.nn.Module,
data_loader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
optimizer: torch.optim.Optimizer,
accuracy_fn,= device):
device: torch.device """
Performs a training with model trying to learn a data_loader
"""
# Training
= 0, 0
train_loss, train_acc
# Put model into training mode
model.train()
# Add a loop to loop through the training batches
for batch, (X, y) in enumerate(data_loader):
# Put data on target device
= X.to(device), y.to(device)
X, y
# 1. Forward pass
= model(X)
y_pred
# 2. Calculate the loss and accuracy
= loss_fn(y_pred, y)
loss += loss # accumulate train loss
train_loss += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))
train_acc
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss bachward
loss.backward()
# 5. Optimizer step
optimizer.step()
# Devide total train loss and accuracy by length of train dataloader
/= len(data_loader)
train_loss /= len(data_loader)
train_acc print(f"Train loss: {train_loss:.3f} | Train accuracy: {train_acc:.2f}%")
def test_step(model: torch.nn.Module,
data_loader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
accuracy_fn,= device):
device: torch.device """
Performs a testing with data_loader trying to evaluate the model
"""
# Training
= 0, 0
test_loss, test_acc
# Put model into testing mode
eval()
model.
# Turn on inference mode context manager
with torch.inference_mode():
# Add a loop to loop through the training batches
for X, y in data_loader:
# Put data on target device
= X.to(device), y.to(device)
X, y
# 1. Forward pass
= model(X)
y_pred
# 2. Calculate the loss and accuracy
= loss_fn(y_pred, y)
loss += loss # accumulate train loss
test_loss += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))
test_acc
# Devide total train loss and accuracy by length of test dataloader
/= len(data_loader)
test_loss /= len(data_loader)
test_acc print(f"Test loss: {test_loss:.3f} | Test accuracy: {test_acc:.2f}%\n")
# Import tqdm for progress bar
from tqdm.auto import tqdm
# Set the seed and start the timer
42)
torch.manual_seed(= timer()
train_time_start_on_gpu
# Set the number of epochs (we'll keep it small for faster training loop)
= 3
epochs
# Create training and test loop
for epoch in tqdm(range(epochs)):
print(f"Epoch: {epoch}\n------")
train_step(= model_1,
model = train_dataloader,
data_loader = loss_fn,
loss_fn = optimizer,
optimizer = accuracy_fn,
accuracy_fn = device
device
)
test_step(= model_1,
model = test_dataloader,
data_loader = loss_fn,
loss_fn = accuracy_fn,
accuracy_fn = device
device
)
# Calculate train time
= timer()
train_time_end_on_gpu = print_train_time(
total_train_time_model_1 =train_time_start_on_gpu,
start=train_time_end_on_gpu,
end=str(next(model_1.parameters()).device)
device )
{"model_id":"6d39c82dcf764aae8a2034637ebffa80","version_major":2,"version_minor":0}
Epoch: 0
------
Train loss: 0.618 | Train accuracy: 77.49%
Test loss: 0.517 | Test accuracy: 81.36%
Epoch: 1
Train loss: 0.427 | Train accuracy: 84.23%
Test loss: 0.442 | Test accuracy: 84.05%
Epoch: 2
Train loss: 0.389 | Train accuracy: 85.66%
Test loss: 0.419 | Test accuracy: 84.49%
Train time on cuda:0: 31.800 seconds
model_0_results
{'model_name': 'FashionMNISTModelV0',
'model_loss': 0.47663894295692444,
'model_acc': 83.42651757188499}
Note: Sometimes , depending on your data/hardware you might find that your model trains faster on CPU than GPU.
Why is this?
- It could be that the overhead fro copying data/model to and from the GPU outweighs the compute benefits offered by the GPU.
- The hardware you're using has a better CPU in terms compute capabilities than the GPU
# Get model_1 results dictionary
= eval_model(model=model_1,
model_1_results =test_dataloader,
data_loader=loss_fn,
loss_fn=accuracy_fn,
accuracy_fn=device)
device model_1_results
{"model_id":"5f1ffe01f1054d8db65eda9b8185b32d","version_major":2,"version_minor":0}
{'model_name': 'FashionMNISTModelV1',
'model_loss': 0.4194523096084595,
'model_acc': 84.49480830670926}
model_0_results
{'model_name': 'FashionMNISTModelV0',
'model_loss': 0.47663894295692444,
'model_acc': 83.42651757188499}
Model 2: Building a Convolutional Neural Network (CNN)
CNN's are also known ConvNets.
CNN's are known for their capabilities to find patterns in visual data
# Create a convolutional neural network
class FashionMNISTModelV2(nn.Module):
"""
Model architecture that replicates the TinyVGG
model from CNN explainer website
"""
def __init__(self, input_shape: int,
int,
hidden_units: int):
output_shape: super().__init__()
self.conv_block_1 = nn.Sequential(
# Create a conv layer
= input_shape,
nn.Conv2d(in_channels = hidden_units,
out_channels=3,
kernel_size=1,
stride=1), # values we can set ourselves in our NN's are called hyperparameters
padding
nn.ReLU(),=hidden_units,
nn.Conv2d(in_channels=hidden_units,
out_channels=3,
kernel_size=1,
stride=1),
padding
nn.ReLU(),=2)
nn.MaxPool2d(kernel_size
)self.conv_block_2 = nn.Sequential(
=hidden_units,
nn.Conv2d(in_channels=hidden_units,
out_channels=3,
kernel_size=1,
stride=1),
padding
nn.ReLU(),=hidden_units,
nn.Conv2d(in_channels=hidden_units,
out_channels=3,
kernel_size=1,
stride=1),
padding
nn.ReLU(),=2)
nn.MaxPool2d(kernel_size
)self.classifier = nn.Sequential(
nn.Flatten(),=hidden_units*7*7, # there is a trick to calculate this
nn.Linear(in_features=output_shape)
out_features
)
def forward(self, x):
= self.conv_block_1(x)
x #print(x.shape)
= self.conv_block_2(x)
x #print(x.shape)
= self.classifier(x)
x #print(x.shape)
return x
42)
torch.manual_seed(= FashionMNISTModelV2(input_shape=1,
model_2 =10,
hidden_units=len(class_names)).to(device)
output_shape model_2
FashionMNISTModelV2(
(conv_block_1): Sequential(
(0): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv_block_2): Sequential(
(0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=490, out_features=10, bias=True)
)
)
="gray") plt.imshow(image.squeeze(), cmap
<matplotlib.image.AxesImage at 0x7e1fa81a1cf0>
= torch.randn(size=(1, 28, 28)).to(device)
rand_mage_tensor # Pass image through model
0)) model_2(rand_mage_tensor.unsqueeze(
tensor([[ 0.0366, -0.0940, 0.0686, -0.0485, 0.0068, 0.0290, 0.0132, 0.0084,
-0.0030, -0.0185]], device='cuda:0', grad_fn=<AddmmBackward0>)
7.1 Stepping through
nn.Conv2d()
42)
torch.manual_seed(
# Create a batch of images
= torch.randn(size=(32, 3, 64, 64))
images = images[0]
test_image
print(f"Image batch shape: {images.shape}")
print(f"Single image shape: {test_image.shape}")
Image batch shape: torch.Size([32, 3, 64, 64])
Single image shape: torch.Size([3, 64, 64])
# Create a single conv2d layer
= nn.Conv2d(in_channels=3,
conv_layer =10,
out_channels=(3,3),
kernel_size=1,
stride=1)
padding
# Pass the data through the convolutional layer
= conv_layer(test_image.unsqueeze(0))
conv_output conv_output.shape
torch.Size([1, 10, 64, 64])
7.2 Stepping through
nn.MaxPool2d()
test_image.shape
torch.Size([3, 64, 64])
# Print out the original image shape without unsqueeze dimension
print(f"Test image original shape: {test_image.shape}")
print(f"Test image with unsqueezed dimension: {test_image.unsqueeze(0).shape}")
# Create a sample nn.MaxPool2d layer
= nn.MaxPool2d(kernel_size=2)
max_pool_layer
# Pass data through just the conv layer
= conv_layer(test_image.unsqueeze(dim=0))
tst_image_through_conv print(f"Shape after going through conv layer: {tst_image_through_conv.shape}")
# Pass data through the max pool layer
= max_pool_layer(tst_image_through_conv)
test_image_through_conv_and_max_pool print(f"Shape after going through conv layer and max pool layer: {test_image_through_conv_and_max_pool.shape}")
Test image original shape: torch.Size([3, 64, 64])
Test image with unsqueezed dimension: torch.Size([1, 3, 64, 64])
Shape after going through conv layer: torch.Size([1, 10, 64, 64])
Shape after going through conv layer and max pool layer: torch.Size([1, 10, 32, 32])
42)
torch.manual_seed(# Create a random tensor with a similar number of dimensions to our image
= torch.randn(size=(1, 1, 2, 2))
random_tensor print(f"Original Tensor:\n {random_tensor}")
print(f"Original Tensor shape: {random_tensor.shape}")
# Create max pool layer
= nn.MaxPool2d(kernel_size=2)
max_pool_layer
# Pass the random tensor through the max pool layer
= max_pool_layer(random_tensor)
random_tensor_through_max_pool print(f"Max pool tensor:\n {random_tensor_through_max_pool}")
print(f"Max pool tensor shape: {random_tensor_through_max_pool.shape}")
Original Tensor:
tensor([[[[0.3367, 0.1288],
[0.2345, 0.2303]]]])
Original Tensor shape: torch.Size([1, 1, 2, 2])
Max pool tensor:
tensor([[[[0.3367]]]])
Max pool tensor shape: torch.Size([1, 1, 1, 1])
7.3 Setup a
loss function and optimizer for model_2
# Setup loss function/eval metrics/optimizer
from helper_functions import accuracy_fn
= nn.CrossEntropyLoss()
loss_fn = torch.optim.SGD(params=model_2.parameters(), lr=0.1) optimizer
7.4 Training and testing model_2 using our training and test functions
42)
torch.manual_seed(42)
torch.cuda.manual_seed(
# Measure time
from timeit import default_timer as timer
= timer()
train_time_start_model_2
# Train and test model
= 3
epochs for epoch in tqdm(range(epochs)):
print(f"Epoch: {epoch}\n-------")
=model_2,
train_step(model=train_dataloader,
data_loader=loss_fn,
loss_fn=optimizer,
optimizer=accuracy_fn,
accuracy_fn=device)
device=model_2,
test_step(model=test_dataloader,
data_loader=loss_fn,
loss_fn=accuracy_fn,
accuracy_fn=device)
device
= timer()
train_time_end_model_2 = print_train_time(
total_train_time_model_2 =train_time_start_model_2,
start=train_time_end_model_2,
end=device
device )
{"model_id":"3c88bb35c73844d68719b022110748a9","version_major":2,"version_minor":0}
Epoch: 0
-------
Train loss: 0.588 | Train accuracy: 78.63%
Test loss: 0.405 | Test accuracy: 85.59%
Epoch: 1
Train loss: 0.364 | Train accuracy: 86.76%
Test loss: 0.351 | Test accuracy: 87.11%
Epoch: 2
Train loss: 0.326 | Train accuracy: 88.16%
Test loss: 0.321 | Test accuracy: 88.63%
Train time on cuda: 36.717 seconds
# Get model_2 results
= eval_model(model=model_2,
model_2_results =test_dataloader,
data_loader=loss_fn,
loss_fn=accuracy_fn,
accuracy_fn=device)
device model_2_results
{"model_id":"fc046fe4e0174802a7e03b85066f0a3d","version_major":2,"version_minor":0}
{'model_name': 'FashionMNISTModelV2',
'model_loss': 0.32091188430786133,
'model_acc': 88.62819488817891}
8. Compare model results and training time
import pandas as pd
= pd.DataFrame([model_0_results,
compare_results
model_1_results,
model_2_results]) compare_results
model_name | model_loss | model_acc | |
---|---|---|---|
0 | FashionMNISTModelV0 | 0.476639 | 83.426518 |
1 | FashionMNISTModelV1 | 0.419452 | 84.494808 |
2 | FashionMNISTModelV2 | 0.320912 | 88.628195 |
# Add training time to results comparison
"training_time"] = [total_train_time_model_0,
compare_results[
total_train_time_model_1,
total_train_time_model_2] compare_results
model_name | model_loss | model_acc | training_time | |
---|---|---|---|---|
0 | FashionMNISTModelV0 | 0.476639 | 83.426518 | 29.722932 |
1 | FashionMNISTModelV1 | 0.419452 | 84.494808 | 31.799669 |
2 | FashionMNISTModelV2 | 0.320912 | 88.628195 | 36.716990 |
# Visualize our model results
"model_name")["model_acc"].plot(kind="barh")
compare_results.set_index("accuracy %")
plt.xlabel("model") plt.ylabel(
Text(0, 0.5, 'model')
9. Make and evaluate random predictions with best model
def make_predictions(model: torch.nn.Module,
list,
data: = device):
device: torch.device = []
pred_probs
model.to(device)eval()
model.with torch.inference_mode():
for sample in data:
# Prepare the sample (add a batch dimension and pass to target device)
= torch.unsqueeze(sample, dim=1).to(device)
sample
# Forward pass (model output row logits)
= model(sample)
pred_logits
# Get prediction pobability (logit -> prediction probabilities)
= torch.softmax(pred_logits.squeeze(), dim=0)
pred_prob
# Get pred_prob off the GPU for futher calculations
pred_probs.append(pred_prob.cpu())
# Stack the pred probs to turn list into tensor
return torch.stack(pred_probs)
import random
#random.seed(42)
= []
test_samples = []
test_labels for sample, label in random.sample(list(test_data), k=9):
test_samples.append(sample)
test_labels.append(label)
# View the first sample shape
0].shape test_samples[
torch.Size([1, 28, 28])
0].squeeze(), cmap="gray")
plt.imshow(test_samples[0]]) plt.title(class_names[test_labels[
Text(0.5, 1.0, 'Coat')
# Make predictions
= make_predictions(model=model_2,
pred_probs =test_samples)
data
# View first two prediction probablities
2] pred_probs[:
tensor([[2.1653e-02, 5.4951e-04, 5.8419e-02, 4.8616e-02, 4.7800e-01, 6.3986e-05,
3.7636e-01, 2.1623e-05, 1.6141e-02, 1.7873e-04],
[4.1643e-01, 8.5945e-05, 4.0930e-03, 1.5850e-02, 5.2843e-05, 5.5267e-07,
5.6329e-01, 2.4754e-07, 2.0591e-04, 7.7119e-07]])
# Convert prediction probabilities to labels
= pred_probs.argmax(dim=1)
pred_classes pred_classes
tensor([4, 6, 1, 9, 9, 0, 0, 4, 9])
test_labels
[4, 0, 1, 9, 9, 0, 0, 4, 9]
# Plot predictions
=(9, 9))
plt.figure(figsize= 3
nrows = 3
ncols for i, sample in enumerate(test_samples):
# Create subplot
+1)
plt.subplot(nrows, ncols, i
# Plot the target image
="gray")
plt.imshow(sample.squeeze(), cmap
# Find the prediction (in text form e.g "Sandal")
= class_names[pred_classes[i]]
pred_label
# Get the truth label (in text form)
= class_names[test_labels[i]]
truth_label
# Create a title for the plot
= f"Pred: {pred_label} | Truth: {truth_label}"
title_text
# Check for equality between pred and truth and change color of title text
if pred_label == truth_label:
=10, c="g")
plt.title(title_text, fontsizeelse:
=10, c="r")
plt.title(title_text, fontsizeFalse) plt.axis(
10. Making a confusion matrix for futher prediction evaluation
A confusion matrix is a fantastic way of evaluating your classification models visually
- Make predictions with our trained model on the test dataset
- Make a confusion matrix
torchmetrics.ConfusionMatrix
- Plot the confusion matrix using
mlxtend.plotting.plot_confusion_matrix()
from tqdm.auto import tqdm
# 1. Make predictions with trained model
= []
y_preds eval()
model_2.with torch.inference_mode():
for X, y in tqdm(test_dataloader, desc="Making predictions..."):
# Send the data and targets to target device
= X.to(device), y.to(device)
X, y
# Do the forward pass
= model_2(X)
y_logit
# Turn predictions from logits -> prediction probabilities -> prediction labels
= torch.softmax(y_logit.squeeze(), dim=0).argmax(dim=1)
y_pred
# Put predictions on cpu for evaluations
y_preds.append(y_pred.cpu())
# Concatenate list of predictions into tensor
print(y_preds[0].shape)
= torch.cat(y_preds)
y_pred_tensor print(y_pred_tensor.shape)
10] y_pred_tensor[:
{"model_id":"86b057e1b4044e849ebebcdf4418e22a","version_major":2,"version_minor":0}
torch.Size([32])
torch.Size([10000])
tensor([9, 2, 1, 1, 6, 1, 4, 6, 5, 7])
import torchmetrics, mlxtend
print(f"mlxtend version: {mlxtend.__version__}")
assert int(mlxtend.__version__.split(".")[1]) >= 19, "mlxtend version should be 0.19.0 or higher"
mlxtend version: 0.23.1
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix
# 2. Setup confusion instance and compare predictions to targets
= ConfusionMatrix(task='multiclass', num_classes=len(class_names))
confmat = confmat(preds=y_pred_tensor,
confmat_tensor =test_data.targets)
target
# 3. Plot the confusion matrix
= plot_confusion_matrix(
fig, ax =confmat_tensor.numpy(), # matplotlib likes working with numpy
conf_mat=class_names,
class_names=(10, 7)
figsize )
confmat_tensor
tensor([[638, 6, 36, 66, 14, 0, 228, 0, 12, 0],
[ 7, 905, 4, 63, 11, 0, 5, 1, 2, 2],
[ 12, 2, 783, 15, 104, 0, 82, 0, 2, 0],
[ 28, 12, 24, 863, 30, 0, 37, 0, 4, 2],
[ 5, 7, 110, 52, 712, 0, 109, 0, 5, 0],
[ 7, 1, 10, 3, 1, 839, 5, 45, 65, 24],
[ 74, 8, 110, 62, 82, 0, 652, 0, 12, 0],
[ 2, 3, 0, 0, 0, 35, 0, 887, 13, 60],
[ 7, 3, 16, 13, 8, 10, 35, 8, 899, 1],
[ 0, 1, 2, 2, 0, 25, 3, 81, 9, 877]])
11. Save and load best performing model
from pathlib import Path
# Create model directory path
= Path("models")
MODEL_PATH =True,
MODEL_PATH.mkdir(parents=True)
exist_ok
# Create model save
= "03_pytorch_computer_vision_model_2.pth"
MODEL_NAME = MODEL_PATH / MODEL_NAME
MODEL_SAVE_PATH MODEL_SAVE_PATH
PosixPath('models/03_pytorch_computer_vision_model_2.pth')
# Save the model state dict
=model_2.state_dict(), f=MODEL_SAVE_PATH) torch.save(obj
# Create new instance
42)
torch.manual_seed(
= FashionMNISTModelV2(input_shape=1,
loaded_model_2 =10,
hidden_units=len(class_names))
output_shape
# Load in the save state_dict()
=MODEL_SAVE_PATH))
loaded_model_2.load_state_dict(torch.load(f= loaded_model_2.to(device) loaded_model_2
model_2_results
{'model_name': 'FashionMNISTModelV2',
'model_loss': 0.32091188430786133,
'model_acc': 88.62819488817891}
# Evaluate loaded model
42)
torch.manual_seed(
= eval_model(
loaded_model_2_results =loaded_model_2,
model=test_dataloader,
data_loader=loss_fn,
loss_fn=accuracy_fn,
accuracy_fn=device
device
) loaded_model_2_results
{"model_id":"74e519b540274321a81c0886ea5fc673","version_major":2,"version_minor":0}
{'model_name': 'FashionMNISTModelV2',
'model_loss': 0.32091188430786133,
'model_acc': 88.62819488817891}
# Check if model results are close to each other
"model_loss"]),
torch.isclose(torch.tensor(model_2_results["model_loss"]),
torch.tensor(loaded_model_2_results[=1e-02) # tollerance atol
tensor(True)
- Q: What are 3 areas in industry where computer vision is currently being used?
A: Computer vision is making significant contributions across a wide range of industries. Here are several key areas where it is currently being utilized:
- Healthcare:
- Medical Imaging and Diagnostics: Enhancing the analysis of X-rays, MRIs, and CT scans for early disease detection.
- Surgical Assistance: Providing real-time imaging and augmented reality overlays during surgeries.
- Telemedicine: Enabling remote patient monitoring through analysis of images and videos.
- Retail:
- Automated Checkout Systems: Implementing cashier-less shopping experiences with real-time product tracking.
- Inventory Management: Monitoring stock levels and managing inventory through shelf image analysis.
- Customer Behavior Analysis: Tracking customer movements and interactions to optimize store layouts and product placements.
- Manufacturing:
- Quality Control: Inspecting products for defects and ensuring high-quality standards on production lines.
- Predictive Maintenance: Analyzing machinery images to predict maintenance needs and prevent breakdowns.
- Automation and Robotics: Enhancing the capabilities of industrial robots through vision-based guidance and inspection.
- Automotive:
- Autonomous Vehicles: Enabling self-driving cars to navigate and understand their surroundings.
- Driver Assistance Systems: Providing features like lane departure warnings, pedestrian detection, and adaptive cruise control.
- Vehicle Inspection: Automating the inspection process for manufacturing and maintenance.
- Security and Surveillance:
- Facial Recognition: Identifying individuals for security and authentication purposes.
- Behavior Analysis: Monitoring and analyzing behavior patterns to detect suspicious activities.
- Access Control: Managing entry to secure areas through visual identification.
- Agriculture:
- Crop Monitoring: Using drones and cameras to monitor crop health, identify diseases, and optimize irrigation.
- Livestock Management: Tracking the health and movement of animals to improve farming practices.
- Yield Prediction: Analyzing images to predict crop yields and optimize harvest timing.
- Finance and Banking:
- Fraud Detection: Using visual data to detect fraudulent activities at ATMs and during transactions.
- Customer Service: Implementing facial recognition for secure access to banking services.
- Document Processing: Automating the processing of checks and other financial documents through image analysis.
- Entertainment and Media:
- Content Creation and Editing: Enhancing video production with special effects, automated editing, and scene recognition.
- Personalization: Tailoring content recommendations based on visual analysis of user preferences.
- Interactive Experiences: Creating augmented and virtual reality experiences.
- Live information. When you walk, your device finds qr codes and get information about what it sees. When you ask a question, it replies you with information it collected.
- Search "what is overfitting in machine learning" and write down a sentence about what you find.
A: Training data always contains noise. Overfitting is when model start to focus too much on noise and missing real pattern. Then the train loss goes down, but test loss goes up. But, if there are a lot of data, at some point model might understand this is a noise and loss will go down on testing as well. This is called grokking
- Search "ways to prevent overfitting in machine learning", write down 3 of the things you find and a sentence about each. Note: there are lots of these, so don't worry too much about all of them, just pick 3 and start with those.
- Cross-Validation: Cross-validation is a technique where the training data is split into multiple subsets (folds). The model is trained and validated multiple times, each time using a different subset for validation and the remaining subsets for training. This helps in ensuring that the model’s performance is consistent across different subsets of data and not just tailored to a particular subset, thus reducing overfitting. Common methods include k-fold cross-validation and leave-one-out cross-validation.
- Early Stopping: During the training process, the model’s performance is monitored on a validation dataset after each iteration. Early stopping involves halting the training process when the model’s performance on the validation set stops improving, even if it continues to improve on the training set. This prevents the model from learning the noise and irrelevant patterns in the training data, which can lead to overfitting.
- Data Augmentation: Data augmentation artificially increases the size of the training dataset by generating new data points from the existing data. This can be done through techniques like rotating, flipping, and cropping images in image datasets or using methods like back-translation and noise addition in text data. Data augmentation enhances the diversity of the training data, helping the model learn more robust and generalizable features, thus reducing overfitting.
- Load the torchvision.datasets.MNIST() train and test datasets.
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor
= datasets.MNIST(
mnist_train ="data_mnist",
root=True,
train=True,
download=ToTensor(),
transform=None
target_transform
)
= datasets.MNIST(
mnist_test ="data_mnist",
root=False,
train=True,
download=ToTensor(),
transform=None
target_transform
)
len(mnist_train), len(mnist_test)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to data_mnist/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 35317727.75it/s]
Extracting data_mnist/MNIST/raw/train-images-idx3-ubyte.gz to data_mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to data_mnist/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 1031793.85it/s]
Extracting data_mnist/MNIST/raw/train-labels-idx1-ubyte.gz to data_mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to data_mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 10376944.99it/s]
Extracting data_mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to data_mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to data_mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 2462899.65it/s]
Extracting data_mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to data_mnist/MNIST/raw
(60000, 10000)
- Visualize at least 5 different samples of the MNIST training dataset.
import matplotlib.pyplot as plt
import torch
= 1
n_row = 5
n_col = plt.subplots(n_row, n_col, figsize=(12, 12))
_, axs = axs.flatten()
axs for i, ax in zip(range(5), axs):
= torch.randint(low=0, high=len(mnist_train), size=()).item()
random_train_number = mnist_train[random_train_number]
image, label ="gray")
ax.imshow(image.squeeze(), cmapf"Number {label}")
ax.set_title( plt.show()
- Turn the MNIST train and test datasets into dataloaders using torch.utils.data.DataLoader, set the batch_size=32.
from torch.utils.data import DataLoader
= DataLoader(mnist_train,
train_dataloader =32,
batch_size=True)
shuffle
= DataLoader(mnist_test,
test_dataloader =32,
batch_size=False)
shuffle
len(train_dataloader), len(test_dataloader), mnist_train.classes
(1875,
313,
['0 - zero',
'1 - one',
'2 - two',
'3 - three',
'4 - four',
'5 - five',
'6 - six',
'7 - seven',
'8 - eight',
'9 - nine'])
- Recreate model_2 used in this notebook (the same model from the CNN Explainer website, also known as TinyVGG) capable of fitting on the MNIST dataset.
from torch import nn
class MNISTModel(nn.Module):
def __init__(self):
super().__init__()
self.layer_1 = nn.Sequential(
=1,
nn.Conv2d(in_channels=10,
out_channels=3,
kernel_size=1,
stride=1),
padding
nn.ReLU(),=10,
nn.Conv2d(in_channels=10,
out_channels=3,
kernel_size=1,
stride=1),
padding
nn.ReLU(),=2)
nn.MaxPool2d(kernel_size
)self.layer_2 = nn.Sequential(
=10,
nn.Conv2d(in_channels=10,
out_channels=3,
kernel_size=1,
stride=1),
padding
nn.ReLU(),=10,
nn.Conv2d(in_channels=10,
out_channels=3,
kernel_size=1,
stride=1),
padding
nn.ReLU(),=2)
nn.MaxPool2d(kernel_size
)self.classifier = nn.Sequential(
nn.Flatten(),=10*7*7,
nn.Linear(in_features=len(mnist_train.classes))
out_features
)
def forward(self, x):
= self.layer_1(x)
x = self.layer_2(x)
x = self.classifier(x)
x return x
= MNISTModel().to("cuda")
model_e model_e
MNISTModel(
(layer_1): Sequential(
(0): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(layer_2): Sequential(
(0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=490, out_features=10, bias=True)
)
)
- Train the model you built in exercise 8. on CPU and GPU and see how long it takes on each.
= nn.CrossEntropyLoss()
loss_fn = torch.optim.SGD(params=model_e.parameters(), lr=0.1) optimizer
from tqdm.auto import tqdm
from timeit import default_timer as timer
def train_and_test(model, device, loss_fn, optimizer):
= timer()
start_time 42)
torch.manual_seed(42)
torch.cuda.manual_seed(= 3
epochs
for epoch in tqdm(range(epochs), "Training model"):
= 0, 0
loss_epoch, accuracy_epoch = 0, 0
test_loss_epoch, test_accuracy_epoch
model.train()for batch, (X, y) in enumerate(train_dataloader):
= X.to(device), y.to(device)
X, y = model(X)
y_logits
= loss_fn(y_logits, y)
loss = accuracy_fn(y_pred=y_logits.argmax(dim=1),
accuracy =y)
y_true+= loss
loss_epoch += accuracy
accuracy_epoch
optimizer.zero_grad()
loss.backward()
optimizer.step()
/= len(train_dataloader)
loss_epoch /= len(train_dataloader)
accuracy_epoch
eval()
model.with torch.inference_mode():
for batch, (X, y) in enumerate(test_dataloader):
= X.to(device), y.to(device)
X, y = model(X)
y_logits
= loss_fn(y_logits, y)
loss_test = accuracy_fn(y_pred=y_logits.argmax(dim=1), y_true=y)
accuracy_test += loss_test
test_loss_epoch += accuracy_test
test_accuracy_epoch
/= len(test_dataloader)
test_loss_epoch /= len(test_dataloader)
test_accuracy_epoch
print(f"Loss: {loss_epoch:.2f}. Accuracy: {accuracy_epoch:.2f}. Test Loss: {test_loss_epoch:.2f}. Test Accuracy: {test_accuracy_epoch:.2f}")
= timer()
end_time print(f"Time spend on training on {device} is {end_time - start_time} seconds")
"cuda", loss_fn, optimizer) train_and_test(model_e,
{"model_id":"668f4474cb834bd9974eceb411a59efc","version_major":2,"version_minor":0}
Loss: 0.33. Accuracy: 89.15. Test Loss: 0.08. Test Accuracy: 97.48
Loss: 0.08. Accuracy: 97.52. Test Loss: 0.06. Test Accuracy: 98.04
Loss: 0.06. Accuracy: 98.11. Test Loss: 0.05. Test Accuracy: 98.31
Time spend on training on cuda is 36.71122506400002 seconds
= MNISTModel().to("cpu")
model_e_cpu = nn.CrossEntropyLoss()
loss_fn = torch.optim.SGD(params=model_e_cpu.parameters(), lr=0.1)
optimizer "cpu", loss_fn, optimizer) train_and_test(model_e_cpu,
{"model_id":"429e836cf8214fb1857fea09e457fbb2","version_major":2,"version_minor":0}
Loss: 0.34. Accuracy: 88.09. Test Loss: 0.08. Test Accuracy: 97.74
Loss: 0.08. Accuracy: 97.42. Test Loss: 0.06. Test Accuracy: 98.12
Loss: 0.06. Accuracy: 97.98. Test Loss: 0.06. Test Accuracy: 97.99
Time spend on training on cpu is 83.42796110799998 seconds
- Make predictions using your trained model and visualize at least 5 of them comparing the prediciton to the target label.
= 1
n_row = 5
n_col = model_e.to("cpu")
model_e = plt.subplots(n_row, n_col, figsize=(12, 12))
_, axs = axs.flatten()
axs for i, ax in zip(range(5), axs):
= torch.randint(low=0, high=len(mnist_test), size=()).item()
random_train_number = mnist_train[random_train_number]
image, label = model_e(image.unsqueeze(dim=0))
y_logit ="gray")
ax.imshow(image.squeeze(), cmapf"{label}/{y_logit.argmax(dim=1).item()}")
ax.set_title( plt.show()
- Plot a confusion matrix comparing your model's predictions to the truth labels.
= []
y_preds_numbers = model_e.to(device)
model_e eval()
model_e.with torch.inference_mode():
for X, y in tqdm(test_dataloader, desc="Making predictions..."):
# Send the data and targets to target device
= X.to(device), y.to(device)
X, y
# Do the forward pass
= model_e(X)
y_logit
# Turn predictions from logits -> prediction probabilities -> prediction labels
= torch.softmax(y_logit.squeeze(), dim=0).argmax(dim=1)
y_pred
# Put predictions on cpu for evaluations
y_preds_numbers.append(y_pred.cpu())
# Concatenate list of predictions into tensor
print(y_preds_numbers[0].shape)
= torch.cat(y_preds_numbers)
y_pred_tensor_numbers print(y_pred_tensor_numbers.shape)
10] y_pred_tensor_numbers[:
{"model_id":"daf25c1feec24b0a8cd7a3b387378a0a","version_major":2,"version_minor":0}
torch.Size([32])
torch.Size([10000])
tensor([7, 2, 1, 0, 4, 1, 8, 8, 8, 9])
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix
# 2. Setup confusion instance and compare predictions to targets
= ConfusionMatrix(task='multiclass', num_classes=len(mnist_test.classes))
confmat = confmat(preds=y_pred_tensor_numbers,
confmat_tensor =mnist_test.targets)
target
# 3. Plot the confusion matrix
= plot_confusion_matrix(
fig, ax =confmat_tensor.numpy(), # matplotlib likes working with numpy
conf_mat=mnist_test.classes,
class_names=(10, 7)
figsize )
- Create a random tensor of shape [1, 3, 64, 64] and pass it through a nn.Conv2d() layer with various hyperparameter settings (these can be any settings you choose), what do you notice if the kernel_size parameter goes up and down?
with torch.inference_mode():
= torch.rand(size=(1, 3, 64, 64))
random_tensor = nn.Conv2d(in_channels=3,
conv_layer =2,
out_channels=64,
kernel_size=0,
padding=1)
stride
= conv_layer(random_tensor)
result result.shape, result
(torch.Size([1, 2, 1, 1]),
tensor([[[[-0.3473]],
[[ 0.2667]]]]))
Начал четвертую часть