In this blog, we will play with cats and dogs datasets. We will build neural network step by step in pytorch, then train the model and predict the image.
PyTorch is an open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab (FAIR).
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. It is an open source machine learning framework.
There are many reasons, but prime among them has to be the surge in graphical processing units (GPUs) performance and their increasing affordability. Designed originally for gaming, GPUs need to perform countless millions of matrix operations per second.
PyTorch defines a class called Tensor (torch.Tensor) to store and operate on homogeneous multidimensional rectangular arrays of numbers. PyTorch Tensors are similar to NumPy Arrays, but can also be operated on a CUDA-capable Nvidia GPU.
import torch
import torchvision
from torchvision import transforms
import os
We have to edit the file, for that we need root access. It encounter error while running some pytorch which was maybe using the PIL library. To fix error, we need to add following lines:
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
use_cuda = torch.cuda.is_available()
use_cuda
We have data like following structure:
base_dir = '/home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/'
train_dir = os.path.join(base_dir, 'train/')
validation_dir = os.path.join(base_dir, 'validation/')
test_dir = os.path.join(base_dir, 'test/')
print(f"""
train_dir = {train_dir}
validation_dir = {validation_dir}
test_dir = {validation_dir}
""")
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
test_cats_dir = os.path.join(test_dir, 'cats')
test_dogs_dir = os.path.join(test_dir, 'dogs')
print(f"""
train_cats_dir = {train_cats_dir}
train_dogs_dir = {train_dogs_dir}
validation_cats_dir = {validation_cats_dir}
validation_dogs_dir = {validation_dogs_dir}
test_cats_dir = {test_cats_dir}
test_dogs_dir = {test_dogs_dir}
""")
print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))
def checkImage(path):
try:
im = Image.open(path)
return True
except:
return False
class
torchvision.transforms.Normalize(mean, std, inplace=False)
Normalize a tensor image with mean and standard deviation. This transform does not support PIL Image. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]
We are resizing every image to the same resolution of 64 × 64. Then convert the images to a tensor, and finally, we normalize the tensor around a specific set of mean and standard deviation points.
We can see that both parameters are "Sequences for each channel". Color images have three channels (red, green, blue), therefore you need three parameters to normalize each channel. The first list [0.485, 0.456, 0.406] is the mean for all three channels and the second [0.229, 0.224, 0.225] is the standard deviation for all three channels.
Normalizing is important because a lot of multiplication will be happening as the input passes through the layers of the neural network. So We are converting values between 0 and 1.
img_transforms = transforms.Compose([
transforms.Resize((64,64)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225] )
])
torchvision.datasets.ImageFolder class
torchvision.datasets.ImageFolder(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[[str], Any] = <function default_loader>, is_valid_file: Optional[Callable[[str], bool]] = None)
A generic data loader where the images are arranged in this way by default:
root/dog/xxx.png
root/dog/xxy.png
root/dog/[...]/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/[...]/asd932_.png
train_data = torchvision.datasets.ImageFolder(root = train_dir, transform = img_transforms, is_valid_file = checkImage)
val_data = torchvision.datasets.ImageFolder(root = validation_dir, transform = img_transforms, is_valid_file = checkImage)
test_data = torchvision.datasets.ImageFolder(root = test_dir, transform = img_transforms, is_valid_file = checkImage)
#By default, PyTorch’s data loaders are set to a batch_size of 1.
BATCH_SIZE = 64
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size = BATCH_SIZE)
val_data_loader = torch.utils.data.DataLoader(val_data, batch_size = BATCH_SIZE)
test_data_loader = torch.utils.data.DataLoader(test_data, batch_size = BATCH_SIZE)
sample = next(iter(train_data_loader))
imgs, lbls = sample
lbls
imgs[0]
import torch.nn as nn
import torch.nn.functional as F
Neural networks can be constructed using the torch.nn package.
Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input) that returns the output.
We do any setup required in init(), in this case calling our superclass constructor and the three fully connected layers (called Linear in PyTorch, as opposed to Dense in Keras). The forward() method describes how data flows through the network in both training and making predictions (inference).
First, we have to convert the 3D tensor (x and y plus three-channel color information —red, green, blue) in an image, remember!—into a 1D tensor so that it can be fed into the first Linear layer, and we do that using the view(). From there, you can see that we apply the layers and the activation functions in order, finally returning the softmax output to give us our prediction for that image.
If you want to create a recurrent network, simply use the same Linear layer multiple times, without having to think about sharing weights. Input size will be 64 64 3.
class MyNeuralNetwork(nn.Module):
def __init__(self, input_size = 12288):
super(MyNeuralNetwork, self).__init__()
self.fc1 = nn.Linear(input_size, 84)
self.fc2 = nn.Linear(84, 50)
self.fc3 = nn.Linear(50,2)
pass
def forward(self, x):
x = x.view(-1, 12288)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
pass
model = MyNeuralNetwork()
print(model)
loss_function = torch.nn.CrossEntropyLoss()
loss_function
The weights are modified using a function called Optimization Function.
torch.optim
is a package implementing various optimization algorithms. To use torch.optim we have to construct an optimizer object, that will hold the current state and will update the parameters based on the computed gradients.
To construct an Optimizer you have to give it an iterable containing the parameters to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=0.001)
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
model.to(device)
def train(start_epochs, n_epochs, model):
for epoch in range(start_epochs, n_epochs + 1):
print(f"epoch = {epoch}")
pass
# return trained model
return model
pass
train(0, 2, model)
We have to initialize training and validation loss as zero. Also set the model in training mode.
def train(start_epochs, n_epochs, model):
for epoch in range(start_epochs, n_epochs + 1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0
#Set the model in training mode
model.train()
print(f"epoch = {epoch}")
# return trained model
return model
pass
train(0, 2, model)
def train(start_epochs, n_epochs, model, train_loader):
for epoch in range(start_epochs, n_epochs + 1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0
#Set the model in training mode
model.train()
print(f"batch started: ")
for batch_idx, (data, target) in enumerate(train_loader):
#print(f"batch_idx: {batch_idx}")
if batch_idx % 50 == 0:
print(f"{batch_idx}, ", end = "")
pass
print(f"epoch = {epoch}")
# return trained model
return model
pass
train(0, 2, model, train_data_loader)
Create a new function called train_process_batches and compute the training parama for the batches for training data.
def train_process_batches(model, train_loader, optimizer, loss_function, verbose = True ):
train_loss = 0.0
model.train()
if verbose:
print(f"Training data batch process: ", end = "")
for batch_idx, (data, target) in enumerate(train_loader):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
#we need to set the gradients to zero before starting to do backpropragation
#because PyTorch accumulates the gradients on subsequent backward passes
optimizer.zero_grad()
#forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
#calculate the batch loss
loss = loss_function(output, target)
#backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
## calculate train_loss
train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
if batch_idx % 50 == 0:
if verbose:
print(f"\t{batch_idx}, {train_loss}", end = "\n")
else:
print(f"\t{batch_idx}, ", end = "")
pass
return train_loss
pass
def train(start_epochs, n_epochs, model, train_loader):
for epoch in range(start_epochs, n_epochs + 1):
print(f"Epoch: {epoch}, ", end = "\n")
# initialize variables to monitor training and validation loss
valid_loss = 0.0
#train model
train_loss = train_process_batches(model, train_loader, optimizer, loss_function)
print(f"\ntrain_loss = {train_loss}")
# return trained model
return model
train(0, 1, model, train_data_loader)
def eval_process_batches(model, val_loader, optimizer, loss_function, verbose = True ):
valid_loss = 0.0
model.eval()
if verbose:
print(f"Test data batch process: ", end = "")
for batch_idx, (data, target) in enumerate(val_loader):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## update the average validation loss
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the batch loss
loss = loss_function(output, target)
# update average validation loss
valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
if batch_idx % 20 == 0:
if verbose:
print(f"\t{batch_idx}, {valid_loss}", end = "\n")
else:
print(f"\t{batch_idx}, ", end = "")
pass
return valid_loss
pass
def train(start_epochs, n_epochs, model, train_loader, val_loader):
for epoch in range(start_epochs, n_epochs+1):
print(f"Epoch: {epoch}, ", end = "\n")
# initialize variables to monitor training and validation loss
valid_loss = 0.0
#train model
train_loss = train_process_batches(model, train_loader, optimizer, loss_function, verbose = False)
valid_loss = eval_process_batches(model, val_loader, optimizer, loss_function, verbose = True)
print(f"\ntrain_loss = {train_loss}")
print(f"\nvalid_loss = {valid_loss}")
# return trained model
return model
train(0, 5, model, train_data_loader, val_data_loader)
img = Image.open(test_dir + "dogs/dog.1500.jpg")
torch.unsqueeze(input, dim) → Tensor
Returns a new tensor with a dimension of size one inserted at the specified position.
Example:
x = torch.tensor([1, 2, 3, 4])
torch.unsqueeze(x, 0)
Output: tensor([[ 1, 2, 3, 4]])
torch.unsqueeze(x, 1)
Output:
tensor([[ 1],
[ 2],
[ 3],
[ 4]])
img = img_transforms(img).to(device)
img = torch.unsqueeze(img, 0)
model.eval()
prediction = F.softmax(model(img), dim = 1)
prediction
PyTorch provides the argmax() function, which returns the index of the highest value of the tensor.
prediction = prediction.argmax()
prediction
labels = ['cats','dogs']
print(labels[prediction])