# Adversarial Attacks on Model for CIFAR-10

CIFAR-10 dataset contains in total of 60,000 colour images, divided into ten classes. This project aims to identify an image belonging to one of these ten categories. Further, the adversarial attacks are performed to check the robustness of such a trained model. The first part of the project is implemented via classes. The next part is designed with help of a decorator with arguments.

## Data Preparation¶

In the first stage, I loaded the relevant libraries dataset CIFAR-10 and pre-processed. Next, I performed data augmentation. For the chosen residual architecture of the neural network, the optimal batch size is 400. If you have more memory on your graphic card, increasing the batch size in tandem with stronger data augmentation is recommended. The whole project works on GPU with device "cuda:0".

In [1]:
%matplotlib inline
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms, models
############################ To depict your neural network ################################################
#!pip install torchviz
from torchviz import make_dot
from graphviz import Digraph
########################### If you want to use TensorBoard ###############################################
#from torch.utils.tensorboard import SummaryWriter
########################## To plot #######################################################################
import numpy as np
import matplotlib.pyplot as plt
################### to dispaly confusing matrix properly install 'jupyterthemes' ########################
#!conda install -c conda-forge jupyterthemes # - for confusion matrix
from jupyterthemes import jtplot
#########################################################################################################
print("CUDA available: ", torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("PyTorch version: ", torch.__version__ )

CUDA available:  True
PyTorch version:  1.11.0


Let's find the mean and standard deviation needed for normalisation during transformations of images. You have to create needed folders if it is necessary.

In [5]:
#! mkdir Data_Sets
transform_cifar10 = transforms.Compose([transforms.Resize((32,32)),
transforms.ToTensor()
])

imgs = torch.stack([img_t for img_t, _ in tensor_cifar10], dim=3)
print(imgs.shape)
imgs.view(3, -1).mean(dim=1)

Files already downloaded and verified
torch.Size([3, 32, 32, 50000])

Out[5]:
tensor([0.4914, 0.4822, 0.4465])
In [6]:
imgs.view(3, -1).std(dim=1)

Out[6]:
tensor([0.2470, 0.2435, 0.2616])

Obtained above values will be used for normalisation below. I also apply min_max() class to squeeze tensor values between 0 and 1. It is necessary, especially for projected attacks since it works with norms on $[0, 1]$ .

In [2]:
#! mkdir Data_Sets
torch.cuda.empty_cache()

class min_max():
def __call__(self, tensor):
tensor = tensor + torch.ones_like(tensor)
tensor = tensor - tensor.min()
tensor = tensor/(tensor.max()-tensor.min())
return tensor

transform_train = transforms.Compose([transforms.Resize((32,32)),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.RandomVerticalFlip(p=0.1),
transforms.RandomAffine(0, shear=10, scale=(0.8,1.2)),
#transforms.RandomResizedCrop(256, scale=(0.5,0.9), ratio=(1, 1)),
#transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
transforms.RandomApply([transforms.Grayscale(num_output_channels=3)], p=0.1),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
min_max()
])

transform_test = transforms.Compose([transforms.Resize((32,32)),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
min_max()
])
#I extract dataset from pytorch repository. Change folder if you wish.
training_dataset = datasets.CIFAR10(root='./Data_Sets/Cifar10',
transform=transform_train)
test_dataset = datasets.CIFAR10(root='./Data_Sets/Cifar10',
transform=transform_test)


Files already downloaded and verified


Let's check some sample image

In [98]:
dataiter = iter(train_loader)
print(len(dataiter))
img, _ = dataiter.next()
img.shape
print(torch.min(img))
print(torch.max(img))
plt.imshow(img[1].permute(1, 2, 0))
plt.show()

125
tensor(0.)
tensor(1.)


In my mind, there is a dog above.

## Optimisation Parameters of Residual Neural Network¶

CIFAR-10 comprises a training set of 50,000 images with ten classes to classify. The test set contains 10,000 images. To achieve over 90 per cent test accuracy, I used a residual convolutional neural network (NN) architecture [1] with residual blocks. The idea is that after a series of convolution-ReLU-convolution operations, the signal is passed to the output. Some blocks can be bypassed. Thus the final signal is still strong enough even if a network is very deep. In this way, we mitigate the problem of vanishing gradients and can build more deep networks in which we do not rely only on general mappings from input to output but also on so-called residual mappings. In other words, we apply "skip connections". Images in CIFAR10 were scaled to 32x32 sizes, so it is convenient to use automatic convolution with a kernel size of 3 and padding of 1 to avoid losing any pixels. To capture the growing complexity of features, I use an increased number of filters from 64 to 1024. The detailed design of NN is depicted here. For better resolution, create your own diagram using 'torchviz'.
Such construction of NN will result in over 93% accuracy, which is pretty enough to demonstrate the idea of adversarial attacks.

In [5]:
###################################### functions ##################################################
def init_weights(m):
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight.data,mode='fan_in',nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias.data, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight.data, 0.5)
nn.init.constant_(m.bias.data, 0)
elif isinstance(m, nn.Linear):
nn.init.kaiming_uniform_(m.weight.data)
nn.init.constant_(m.bias.data, 0)

def accuracy(outputs, labels):
'''Computes accuracy'''
pred = 0.0
_, val_preds = torch.max(outputs, 1)
pred += torch.sum(val_preds == labels.data)
return pred

class bcolors:
OK = '\033[92m' #GREEN
WARNING = '\033[93m' #YELLOW
FAIL = '\033[91m' #RED
RESET = '\033[0m' #RESET COLOR

class EarlyStopping:
"""Early stops the training if validation loss doesn't improve after a given patience.
https://github.com/Bjarten/early-stopping-pytorch"""

def __init__(self, patience=7, verbose=False, delta=0, path='Models/checkpoint.pt', trace_func=print):
"""
Args:
patience (int): How long to wait after last time validation loss improved.
Default: 7
verbose (bool): If True, prints a message for each validation loss improvement.
Default: False
delta (float): Minimum change in the monitored quantity to qualify as an improvement.
Default: 0
path (str): Path for the checkpoint to be saved to.
Default: 'checkpoint.pt'
trace_func (function): trace print function.
Default: print
"""
self.patience = patience
self.verbose = verbose
self.counter = 0
self.best_score = None
self.early_stop = False
self.val_loss_min = np.Inf
self.delta = delta
self.path = path
self.trace_func = trace_func
def __call__(self, val_loss, model):

score = -val_loss

if self.best_score is None:
self.best_score = score
self.save_checkpoint(val_loss, model)
elif score < self.best_score + self.delta:
self.counter += 1
self.trace_func(f'{bcolors.FAIL}EarlyStopping counter: {self.counter} out of {self.patience}{bcolors.RESET}')
if self.counter >= self.patience:
self.early_stop = True
else:
self.best_score = score
self.save_checkpoint(val_loss, model)
self.counter = 0

def save_checkpoint(self, val_loss, model):
'''Saves model when validation loss decrease.'''
if self.verbose:
self.trace_func(f'{bcolors.OK}Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ...{bcolors.RESET}')
torch.save(model.state_dict(), self.path)
self.val_loss_min = val_loss

##### Neural Network's Residual Blocks and Neural Network Model¶
In [6]:
def conv_block(in_channels, out_channels, pool=False):
layers = [nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=(3, 3), padding=1, bias=False),
nn.BatchNorm2d(out_channels),
if pool: layers.append(nn.MaxPool2d(2))
return nn.Sequential(*layers)

class ResNet(nn.Module):
def __init__(self, in_channels, num_classes):
super().__init__()
in_channels = in_channels
num_classes = num_classes

self.conv1 = conv_block(in_channels, 64)
self.conv2 = conv_block(64, 128, pool=True)
self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))
self.conv3 = conv_block(128, 256, pool=True)
self.conv4 = conv_block(256, 256, pool=False)
self.conv5 = conv_block(256, 512, pool=True)
self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))
self.conv6 = conv_block(512, 1024, pool=False)
self.classifier = nn.Sequential(nn.MaxPool2d(4),
nn.Flatten(),
nn.Linear(1024, num_classes))

def forward(self, x):
out = self.conv1(x)
out = self.conv2(out)
out = self.res1(out) + out
out = self.conv3(out)
out = self.conv4(out)
out = self.conv5(out)
out = self.res2(out) + out
out = self.conv6(out)
out = self.classifier(out)
return out

##### Class for training, validation and plotting¶
In [9]:
class Fab():
def __init__(self, model):
self.trained_model = model

'''Trainer'''
# initialize the early_stopping object
early_stopping = EarlyStopping(patience=patience, verbose=True)
criterion = nn.CrossEntropyLoss()# loss function
optimizer = torch.optim.Adam(self.trained_model.parameters(), lr = learn_rate, weight_decay=1e-4)#, eps=1e-8)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=learn_rate, epochs=epochs,\

self.run_train_loss = []
self.run_train_acc = []
self.run_test_loss = []
self.run_test_acc = []
self.batches_train = []
self.batches_test = []
self.lrs = []
torch.cuda.empty_cache()
for k in range(epochs):
train_loss = 0.0
train_acc = 0.0
test_loss = 0.0
test_acc= 0.0

self.trained_model.train() #train mode
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()# reset values of derivativesfrom graphs
outputs = self.trained_model(inputs)# model processes
loss = criterion(outputs, labels)# loss is computed
self.batches_train.append(loss.item())# needed for plots
loss.backward() # BACKPROPAGATION
optimizer.step() # parameter update based on current gradient stored in .grad attribute of a parameter
scheduler.step()
rate = optimizer.param_groups[0]['lr']
self.lrs.append(rate)

train_loss += loss.item() # accumaulate loss (should go down )
train_acc += accuracy(outputs, labels) # accuracy
else:
self.trained_model.eval()# eval mode to turn off dropouts or batchnormalisation
tes_inputs = tes_inputs.to(device)
tes_labels = tes_labels.to(device)
tes_outputs = self.trained_model(tes_inputs)
tes_loss = criterion(tes_outputs, tes_labels)
self.batches_test.append(tes_loss.item())
test_loss += tes_loss.item()
test_acc += accuracy(tes_outputs, tes_labels.data)

epoch_acc = train_acc.float()/ len(train_loader.dataset) #divide by the longth of entire dataset
self.run_train_loss.append(epoch_loss)
self.run_train_acc.append(epoch_acc.detach().cpu().numpy())
self.run_test_loss.append(tes_epoch_loss)
self.run_test_acc.append(tes_epoch_acc.detach().cpu().numpy())

if (k + 1) % 1 == 0:
print('Epoch [{}/{}] with rate: {:.6f}, Loss: {:.4f} / {:.4f}, Accuracy: {:.4f}  / {:.4f} '\
.format(k + 1, epochs, rate, self.run_train_loss[k], self.run_test_loss[k],\
self.run_train_acc[k], self.run_test_acc[k]))

early_stopping(tes_epoch_loss, self.trained_model)
if early_stopping.early_stop:
print(f"{bcolors.FAIL}Early stopping{bcolors.RESET}")
print(' ')
break
# load the last checkpoint with the best model; you will have to create folder 'Models'

def plotti(self):
fig, ax = plt.subplots(2, 2, figsize=(18, 11))
ax[0,0].plot(range(1, len(self.run_train_loss)+1), self.run_train_loss,\
c='indigo',linewidth=1.8, label='Train loss')
ax[0,0].plot(range(1, len(self.run_test_loss)+1), self.run_test_loss,\
c='deeppink',linewidth=1.8, label='Test loss')
opt_poss = self.run_test_loss.index(min(self.run_test_loss))+1
ax[0,0].axvline(opt_poss, linestyle='--', color='b',label='Obtained Model')
ax[1,0].plot(self.batches_train, c='grey', alpha=0.5, label='Train minibatch cost')
ax[1,0].plot(np.convolve(self.batches_train, np.ones(10,)/10, mode='valid'),\
c='indigo',linewidth=1 , label='Running average')
ax[1,1].plot(self.batches_test, c='grey', alpha=0.5, label='Test minibatch cost')
ax[1,1].plot(np.convolve(self.batches_test, np.ones(10,)/10, mode='valid'),\
c='deeppink',linewidth=1 , label='Running average')
ax[0,1].plot(range(1, len(self.run_test_acc)+1), self.run_test_acc,\
c='deeppink',linewidth=1.8, label='Test accuracy')
ax[0,1].plot(range(1, len(self.run_train_acc)+1), self.run_train_acc,\
c='indigo',linewidth=1.8, label='Train accuracy')
ax[0,1].axvline(opt_poss, linestyle='--', color='b',label='Obtained Model')
ax[0,0].set(xlabel="Epochs",ylabel="Loss")
ax[0,1].set(xlabel="Epochs",ylabel="Accuracy")
ax[0,1].set(ylabel="Accuracy")
ax[1,0].set_ylim([0, 1.7])
ax[1,0].set(xlabel="Number of iterations", ylabel="Train CrossEntropy")
ax[1,1].set(xlabel="Number of iterations", ylabel="Test CrossEntropy")
ax[0,0].legend(), ax[0,1].legend(), ax[1,0].legend(), ax[1,1].legend()
plt.tight_layout()
plt.show()


Regarding hyperparameters, I use 400 batch-size. It seems to be a good trade-off between speed of optimisation and desired accuracy. However, it slightly overfits our model at the last stage of the training (train longer for better and model the training cycle via a scheduler). Thus I applied EarlyStoping class to prevent too excessive overfitting. Anyway, slight overfitting is advisable in our case because it increases the robustness of the trained model. Besides, data augmentation and batch normalisations help much here. As an optimiser, I employed Adam since it uses past gradients to compute present gradients and momentum. The latter is using the moving averages instead of the gradient itself. The learning rate was set up according to the one-cycle scheduler, with the peak at 0.0004. It is low but typical for Adam. I optimise with cross-entropy as a loss function to measure the difference between distributions of classified objects. Cross-entropy plays well with activation functions as the derivative depends only on neurons' outputs, targets, and inputs. In other words, cross-entropy deals well with the potential problem of saturated neurons. Gradient clipping prevents losing the right path and seems indispensable for cosine schedulers.
The reader can visualise the structure of the neural network by using the script below. You have to install torchviz through 'pip install torchviz' to use the snippet below.

In [10]:
model2 = ResNet(3, 10)
x = torch.randn(1, 3, 32, 32).requires_grad_(True)
y = model2(x)
make_dot(y, params=dict(list(model2.named_parameters()))).render("Residual_net", format="png")

Out[10]:
'Residual_net.png'
##### Now we can train our model¶

If necessary create folder 'Models' for storing, i.e., Fab() will store trained model in the 'Models' for further usage.

In [57]:
!mkdir Models
trainer = Fab(ResNet(3, 10).apply(init_weights).to(device))

Epoch [1/36] with rate: 0.000263, Loss: 1.5204 / 1.3121, Accuracy: 0.4570  / 0.5378
Validation loss decreased (inf --> 1.312075).  Saving model ...
Epoch [2/36] with rate: 0.000590, Loss: 1.1853 / 1.0342, Accuracy: 0.5792  / 0.6365
Validation loss decreased (1.312075 --> 1.034181).  Saving model ...
Epoch [3/36] with rate: 0.001096, Loss: 1.0251 / 1.2177, Accuracy: 0.6375  / 0.6021
EarlyStopping counter: 1 out of 7
Epoch [4/36] with rate: 0.001730, Loss: 0.9039 / 0.9791, Accuracy: 0.6833  / 0.6678
Validation loss decreased (1.034181 --> 0.979051).  Saving model ...
Epoch [5/36] with rate: 0.002426, Loss: 0.8238 / 0.7967, Accuracy: 0.7102  / 0.7346
Validation loss decreased (0.979051 --> 0.796745).  Saving model ...
Epoch [6/36] with rate: 0.003112, Loss: 0.7677 / 0.6977, Accuracy: 0.7300  / 0.7594
Validation loss decreased (0.796745 --> 0.697677).  Saving model ...
Epoch [7/36] with rate: 0.003715, Loss: 0.7158 / 0.6057, Accuracy: 0.7498  / 0.7878
Validation loss decreased (0.697677 --> 0.605686).  Saving model ...
Epoch [8/36] with rate: 0.004175, Loss: 0.6920 / 0.7524, Accuracy: 0.7576  / 0.7493
EarlyStopping counter: 1 out of 7
Epoch [9/36] with rate: 0.004443, Loss: 0.6363 / 0.6794, Accuracy: 0.7792  / 0.7701
EarlyStopping counter: 2 out of 7
Epoch [10/36] with rate: 0.004499, Loss: 0.6106 / 0.5718, Accuracy: 0.7889  / 0.7998
Validation loss decreased (0.605686 --> 0.571798).  Saving model ...
Epoch [11/36] with rate: 0.004473, Loss: 0.5782 / 0.5947, Accuracy: 0.7994  / 0.8012
EarlyStopping counter: 1 out of 7
Epoch [12/36] with rate: 0.004416, Loss: 0.5589 / 0.6923, Accuracy: 0.8066  / 0.7669
EarlyStopping counter: 2 out of 7
Epoch [13/36] with rate: 0.004328, Loss: 0.5355 / 0.5389, Accuracy: 0.8134  / 0.8183
Validation loss decreased (0.571798 --> 0.538858).  Saving model ...
Epoch [14/36] with rate: 0.004211, Loss: 0.5136 / 0.5361, Accuracy: 0.8232  / 0.8267
Validation loss decreased (0.538858 --> 0.536084).  Saving model ...
Epoch [15/36] with rate: 0.004065, Loss: 0.4984 / 0.5165, Accuracy: 0.8288  / 0.8254
Validation loss decreased (0.536084 --> 0.516477).  Saving model ...
Epoch [16/36] with rate: 0.003894, Loss: 0.4863 / 0.4289, Accuracy: 0.8293  / 0.8574
Validation loss decreased (0.516477 --> 0.428908).  Saving model ...
Epoch [17/36] with rate: 0.003699, Loss: 0.4720 / 0.5616, Accuracy: 0.8370  / 0.8090
EarlyStopping counter: 1 out of 7
Epoch [18/36] with rate: 0.003483, Loss: 0.4551 / 0.4447, Accuracy: 0.8418  / 0.8457
EarlyStopping counter: 2 out of 7
Epoch [19/36] with rate: 0.003250, Loss: 0.4427 / 0.4408, Accuracy: 0.8450  / 0.8535
EarlyStopping counter: 3 out of 7
Epoch [20/36] with rate: 0.003002, Loss: 0.4302 / 0.3836, Accuracy: 0.8527  / 0.8707
Validation loss decreased (0.428908 --> 0.383550).  Saving model ...
Epoch [21/36] with rate: 0.002744, Loss: 0.4095 / 0.4849, Accuracy: 0.8591  / 0.8331
EarlyStopping counter: 1 out of 7
Epoch [22/36] with rate: 0.002479, Loss: 0.3862 / 0.3747, Accuracy: 0.8645  / 0.8789
Validation loss decreased (0.383550 --> 0.374691).  Saving model ...
Epoch [23/36] with rate: 0.002210, Loss: 0.3703 / 0.3659, Accuracy: 0.8718  / 0.8748
Validation loss decreased (0.374691 --> 0.365947).  Saving model ...
Epoch [24/36] with rate: 0.001942, Loss: 0.3554 / 0.3522, Accuracy: 0.8760  / 0.8844
Validation loss decreased (0.365947 --> 0.352176).  Saving model ...
Epoch [25/36] with rate: 0.001679, Loss: 0.3316 / 0.3522, Accuracy: 0.8855  / 0.8828
EarlyStopping counter: 1 out of 7
Epoch [26/36] with rate: 0.001423, Loss: 0.3083 / 0.3642, Accuracy: 0.8931  / 0.8816
EarlyStopping counter: 2 out of 7
Epoch [27/36] with rate: 0.001179, Loss: 0.2763 / 0.2921, Accuracy: 0.9049  / 0.9058
Validation loss decreased (0.352176 --> 0.292118).  Saving model ...
Epoch [28/36] with rate: 0.000951, Loss: 0.2583 / 0.3062, Accuracy: 0.9099  / 0.8975
EarlyStopping counter: 1 out of 7
Epoch [29/36] with rate: 0.000741, Loss: 0.2291 / 0.2594, Accuracy: 0.9207  / 0.9161
Validation loss decreased (0.292118 --> 0.259370).  Saving model ...
Epoch [30/36] with rate: 0.000553, Loss: 0.2069 / 0.2425, Accuracy: 0.9296  / 0.9210
Validation loss decreased (0.259370 --> 0.242514).  Saving model ...
Epoch [31/36] with rate: 0.000389, Loss: 0.1844 / 0.2293, Accuracy: 0.9378  / 0.9260
Validation loss decreased (0.242514 --> 0.229345).  Saving model ...
Epoch [32/36] with rate: 0.000251, Loss: 0.1594 / 0.2202, Accuracy: 0.9467  / 0.9292
Validation loss decreased (0.229345 --> 0.220199).  Saving model ...
Epoch [33/36] with rate: 0.000142, Loss: 0.1512 / 0.2164, Accuracy: 0.9494  / 0.9307
Validation loss decreased (0.220199 --> 0.216385).  Saving model ...
Epoch [34/36] with rate: 0.000064, Loss: 0.1333 / 0.2130, Accuracy: 0.9562  / 0.9326
Validation loss decreased (0.216385 --> 0.212994).  Saving model ...
Epoch [35/36] with rate: 0.000016, Loss: 0.1278 / 0.2117, Accuracy: 0.9593  / 0.9332
Validation loss decreased (0.212994 --> 0.211715).  Saving model ...
Epoch [36/36] with rate: 0.000000, Loss: 0.1258 / 0.2120, Accuracy: 0.9599  / 0.9336
EarlyStopping counter: 1 out of 7


On the plots, you can observe the training process. There are mini-batch costs and their moving averages at the bottom. The trained model is slightly overfitted, which is in favour of adversarial robustness later [7]. To achieve around 90% training accuracy, you can shorten the training process to 12 epochs only.

In [58]:
trainer.plotti()


To display the scheduler profile

In [59]:
plt.plot(trainer.lrs, c='magenta')
plt.xlabel('Number of iterations')
plt.ylabel('Learning rate')
plt.show()

##### Load trained model from the checkpoint.pt¶
In [60]:
my_model = trainer.trained_model
#Model is loaded from the checkpoint

Out[60]:
<All keys matched successfully>

If you want to load the previousely trained model, activate all cells up to the Fab() class and do not touch these two lines:

'''

trainer = Fab(ResNet(3, 10).apply(init_weights).to(device))

'''

Next as follows

In [11]:
# trainer = Fab(ResNet(3, 10).apply(init_weights).to(device))
# my_model = trainer.trained_model
# #Model is loaded from the checkpoint

Out[11]:
<All keys matched successfully>
##### Now we can display random predictions¶
In [14]:
################ Load and denormalise images ###############################################
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def im_convert(tensor):
'''Denormalisation'''
image = tensor.cpu().clone().detach().numpy()
image = image.transpose(1, 2, 0)
image = image * np.array((0.4914, 0.4822, 0.4465)) + np.array((0.2470, 0.2435, 0.2616))
image = image.clip(0, 1)
return image

############## Computes soft-max probability for randomnly chosen image #####################
images, labels = dataiter.next()
images = images.to(device)
labels = labels.to(device)
output = my_model(images)
_, preds = torch.max(output, 1)
output_probs = F.softmax(output, dim=1)
output_probs = output_probs.cpu()
x_pred_prob =  np.round((torch.max(output_probs.data, 1)[0]) * 100,4)

fig = plt.figure(figsize=(16, 10))
fig.suptitle('Example of predictions made by model with probailities', fontsize=26)
for idx in np.arange(32):
sm = nn.Softmax(dim=1)
sm_outputs = sm(output)
ax = fig.add_subplot(4, 8, idx+1, xticks=[], yticks=[])
plt.imshow(im_convert(images[idx]))
ax.set_title("{} ({}) \n {}".format(str(classes[preds[idx].item()]),\
str(classes[labels[idx].item()]),\
str(x_pred_prob.numpy()[idx])), \
color=("dodgerblue" if preds[idx]==labels[idx] else "deeppink"))


These pictures are a part of the classification results in the testing set. The numbers over the images are the probabilities (confidences) about classification correctness, which comes from the softmax function according to the categories that the pictures are classified. You can observe that our model works pretty well. Later we will check its robustness.

##### The confusion matrix is very convenient for depicting the overall accuracy of every category¶
In [36]:
#################### Computes accuracies for each of ten categories  ######################

total_correct = 0
total_images = 0
conf_mat = np.zeros([10,10], int)
img = img.to(device)
labels = labels.to(device)
outputs = model(img)
_, predicted = torch.max(outputs.data, 1)
total_images += labels.size(0)
total_correct += (predicted == labels).sum().item()
for i, k in enumerate(labels):
conf_mat[k.item(), predicted[i].item()] += 1

model_accuracy = total_correct / total_images * 100

if matrix:
###################### Confusion matrix #############################################
fig, ax = plt.subplots(1,1,figsize=(12,10))
ax.matshow(conf_mat, aspect='auto', vmin=0, vmax=1000, cmap=plt.get_cmap('icefire'))
for i, j in zip(*conf_mat.nonzero()):
ax.text(j, i, conf_mat[i, j], color='black', ha='center', va='center')
ax.set_ylabel('Actual Category', fontsize=22)
plt.yticks(range(10), classes, fontsize=17)
ax.set_xlabel('Predicted Category', fontsize=22)
plt.xticks(range(10), classes, fontsize=17)
plt.show()
##################### Produce the table #################################################
print(' ')
print('{0:10s} - {1}'.format('Category','Accuracy'))
for i, r in enumerate(conf_mat):
print('{0:10s} - {1:.1f}'.format(classes[i], r[i]/np.sum(r)*100))



Category   - Accuracy
plane      - 93.7
car        - 97.1
bird       - 91.0
cat        - 84.5
deer       - 94.0
dog        - 90.8
frog       - 95.6
horse      - 95.0
ship       - 96.1
truck      - 95.3


The worst accuracy is for the cats. From the confusing matrix, it is evident that the model classifies too many dogs as cats and vice versa, i.e. false positives, namely II type error. Statistically, the model accepts hypothesis, $H_{0}$ when it is false. Note that FGSM adversarially lowers the number of true positives favouring false negatives.

To check the robustness of our trained model, I applied three types of adversarial attacks. The idea is that if our model correctly classified some input data, we impose perturbation $\delta$ to that input data $x$ to escape from the minimum of the objective function. We set weights (trained model parameters) and compute gradients w.r.t input data in an adversarial algorithm. As a result, the algorithm goes in the direction of the gradient and hence there is a gradient ascent. Note that in gradient descent, we impose a negative sign to go in the opposite direction of the gradient. In general, the aim of the adversarial attack is to create imperceptible data inputs such that the model misclassifies these inputs. For perturbated input ${x'}=x+\delta\,\,\in[0,1]^{n}$ and output $y$ we obtain not true labels and we can write $$f_{\Theta}({x'})\neq y_{true}$$ Also, a perturbation should be fixed on each pixel by size $\epsilon$ (max perturbation), and we obtain bounded adversarial pixel within $\epsilon$-neigbourhood for $L_{p}$ norm $$||{x'}-x||_{p}<\epsilon$$ , which in general can be written as $$\displaystyle \max_{x}Loss(f(x+\delta),y), \,\,\,\delta\in\mathcal{B}_{\epsilon}(x)$$ or if we use a negative loss $$\displaystyle \min_{x}-Loss(f(x+\delta),y),\,\,\,\delta\in\mathcal{B}_{\epsilon}(x)$$ where ${B}_{\epsilon}(x)$ is a ball. Thus, we compute the gradient w.r.t input, $x$, and our aim is to find a $x$ s.t. the loss is maximized that leads to the misclassification.

In the case of metric spaces, the distance can be measured by means of standard $L_{p}$ norms.

There are ten classes in Cifar-10, so we deal with the maulti-class attack where every point gets assigned to input class, $C_{i}$ for $i=1,...,k=10$. Therefore, there are $k=10$ discriminant classes, $g_{i}(x)$. In case of correct classification, point $x$ is classified as $C_{i}$ if discriminat $g_{i}(x)\geq g_{j}(x),\,\,\forall i\neq j$ for any other region $j$ of nine remaining regions (in case of the CIFAR-10). Namely, our discriminant $g_{i}(x)$ is higher than the value of any other discrimant applied on $x$ but mapped from other region $j$, thus $$g_{1} - g_{i}\leq 0$$ $$\vdots$$ $$g_{k} - g_{i}\leq 0$$

Equivalently, we want the worst discriminator to be non-positive $$\displaystyle \max_{x_{j\neq i}}g_{j}(x)-g_{i}(x)\leq 0$$ However, as a result of attack, our point, $x$, is shifted to class $C_{t}$ (targeted), that is why we would like the constrained inequality $$\displaystyle \max_{x_{j\neq t}}g_{j}(x)-g_{t}(x)\leq 0$$ where $g_{t}(x)$ is the discriminant function of the target space, and $g_{j}(x)$ are the discriminants functions of the all other classes st $i\neq j$. In the last eqn., we have to be sure that if we pass $x$ into $g_{t}(x)$ then it would give us the max value in comparision to the $x$ value under discriminat in any other space, $g_{j}(x)$.

If we think of an untargeted attack, we consider $$g_{i}(x)-\displaystyle \max_{x_{j\neq i}}g_{j}(x)\leq 0$$ where the current(input) class $g_{i}(x)$ must be less than any other class, even the weakest one, among the classes we are considering. It means that for a successful attack $x$ can be shifted to any class $j$ and the algorithm considers all classes $j$. If you need a more deep explanation google Stanley Chan. It is also worth mentioning that the set in question is not necessarily convex.

In this notebook, I decided to demonstrate the idea using proposed by Goodfellow et. al [3]: Fast Gradient Sign Method (FGSM) and Iterative Fast Gradient Sign Method (I-FSGM) together with Projected Gradient Descent Method (PGD-attack). The distance function is $L_{\infty}$, i.e., we measure max distance between any pixel $x'$ and $x$ with the possible perturbation directions at $+/- 45^{\circ}$ or $+/- 135^{\circ}$, which are corners of the square of $L_{\infty}$ norm. Therefore, FGSM under $L_{\infty}$ cannot be the optimal attack since it excludes the more optimal directions of perturbation, which are possible for $L_{2}$ norm. Basic FGSM is an untargeted method that perturbs input by $\epsilon$ in one step only. "sign" because the algorithm goes along the gradient, whose sign is detected during PyTorch operations on graphs during backpropagation. The algorithm is based on the following optimization problem with $\delta$ as perturbation

$$\displaystyle \max_{\delta}\nabla_{x}Loss(x; w)^{T}\delta+Loss(x; w)\,\,\, subject\,\, to\,\,\, ||\delta||_{\infty}\leq \epsilon$$

The above results in the solution where $x'$ is an adversarial example

$$x'=x+\epsilon\cdot sgn\,(\nabla_{x}J(x, y_{true}))$$

FGSM is the basic example of Maximum Loss Attack where the constraint is $||x'-x||_{p}\leq \epsilon$ and the objective is to maximize $g_{t}(x)-\displaystyle \max_{x_{j\neq i}}g_{j}(x)$, where we substruct from target everything what is NOT a target.

The development of FSGM is I-FGSM which imposes multiple perturbations along the gradient on a graph. Naturally, this method is superior to the first because the move is deeper (it accumulates values of gradients from $n$ iterations of a loop). We also limit the magnitude of each perturbation so that the distortion is still in $\epsilon$-neigbourhood. We took $\alpha=\frac{\epsilon}{n}$ and write

$$x_{0}' = x;\,\,\,\, x_{n+1}'=Clip_{x,\epsilon} \left \{ x_{n}' +\,\,\alpha\cdot sgn\,(\nabla_{x_{n}'}J(x_{n}', y_{true})) \right \}$$

These algorithms shift examples adversarially beyond the set in which the model correctly classified them. In other words, true positives become false negatives. Goodfellow claims that adversarialness stems from linear behaviours of subspaces in highly nonlinear spaces of neural networks [2], whereas the true target is still the same. However, the newer approach claims that when the dimensionality of the set of functions increases, we have to deal with different spaces; that is, the norms of all vectors change and hence target moves [4].

Madry et al. proposed the projection of perturbation onto a ball ${B}_{\epsilon}(x)$ around an example $x$. Before gradient calculation, the uniform noise is added. $$x_0'= x +\mathrm{U}(-\epsilon, \epsilon)$$ $$x_{n+1}'=\Pi_{{B}_{\epsilon}(x)}({x_{n}'}+\alpha \cdot sgn(\nabla_{x_{n}'}Loss(f(x_{n}',y)))$$ where $\Pi_{{B}_{\epsilon}(x)}$ is the projection onto ${B}_{\epsilon}(x)$.
As you see the last two algorithms are very similar.

## Implementation of FSGM, I-FSGM and PGD attacks¶

For attacks, I applied the decorator with arguments. Decorated functions also admit arguments if it is necessary.

In [53]:
############################# PLOTTER of IMAGES ##################################################
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def im_convert_2(tensor):
'''Denormalisation'''
image = tensor.transpose(1, 2, 0)
#image = image * np.array((0.4914, 0.4822, 0.4465)) + np.array((0.2470, 0.2435, 0.2616))
image = image.clip(0, 1)
return image

def plotter(imgs, probas, classes, epsilons):
'''To plot samples of perturbed images'''
cnt = 0
plt.figure(figsize=(13,17))
#plt.suptitle('Examples of predictions of model with probailities after FGSM attack', fontsize=28)
for i in range(len(epsilons)):
for j in range(len(imgs[i])):
cnt += 1
plt.subplot(len(epsilons),len(imgs[0]),cnt)
plt.xticks([], [])
plt.yticks([], [])
if j == 0:
plt.ylabel("Eps: {}".format(epsilons[i]), fontsize=16)
str(classes[orig]),\
str(probas[i][j]) ), \
color=("dodgerblue" if classes[orig]==classes[adv] else "deeppink"), fontsize=18)
plt.imshow(im_convert_2(ex[:,:]))
plt.tight_layout()
plt.show()

#################################### ATTACK DECORATOR ####################################################
def main_attack(model, device, test_loader, epsilons, matrix=False):
def decorator(func):
def wrapper(*args, **kwargs):
'''Skeleton for FGSM computations'''
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
accuracies = []
examples = []
probs = []
for eps in epsilons:
conf_mat = np.zeros([10,10], int)
correct = 0
x_probs = []

img, target = img.clone().detach().to(device), target.clone().detach().to(device)

model.to(device)
model.eval()
output = model(img)
# get the index of the max log-probability
init_pred = output.max(1, keepdim=True)[1]
if eps == 0.0:
perturbed_img = img
else:
perturbed_img = func(model, img, target, eps, *args, **kwargs)
#to get perturbed img
output = model(perturbed_img)
#softmax for probabilities
output_probs = F.softmax(output, dim=1)#standardise for probability
output_probs = output_probs.clone().detach().cpu()
#get probability for given image from tensor
x_pred_prob =  (np.round((torch.max(output_probs.data, 1)[0]) * 100,4)).numpy()
# Check for success
# get the index of the max log-probability
final_pred = output.max(1, keepdim=True)[1]
for i, k in enumerate(target):
conf_mat[k.item(), final_pred[i].item()] += 1
if final_pred.item() == target.item():
correct += 1
# Save some adv examples for visualization later; here 7 columns
x_probs.append(x_pred_prob)
print("Epsilon: {}\tTest Accuracy = {} / {} = {:.4f}".format(eps,\

accuracies.append(acc)
probs.append(x_probs)
print(' ')

if matrix:
###################### Confusion matrix #############################################
fig, ax = plt.subplots(1,1,figsize=(8,6))
ax.matshow(conf_mat, aspect='auto', vmin=0, vmax=1000, cmap=plt.get_cmap('icefire'))
for i, j in zip(*conf_mat.nonzero()):
ax.text(j, i, conf_mat[i, j], color='black', ha='center', va='center')
ax.set_ylabel('Actual Category', fontsize=12)
plt.yticks(range(10), classes, fontsize=13)
ax.set_xlabel('Predicted Category', fontsize=12)
plt.xticks(range(10), classes, fontsize=13)
plt.show()
return accuracies, examples, probs
return wrapper
return decorator

In [56]:
test2_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=1)


### FGSM¶

$$x'=x+\epsilon\cdot sgn\,(\nabla_{x}J(x, y_{true}))$$

The distance measure is $L_{\infty}$ . 'torch.autograd.grad' is used to compute derivative of loss w.r.t inputs. Next the perturbation is added. Technically, there is a gradient ascent in the attack's case.

In [57]:
epsilons = [0.0, 0.0025, 0.005, 0.0075, 0.01, 0.02, 0.03, 0.04]
def fgsm(model, img, target, eps):
'''FGSM algorithm'''
loss = F.nll_loss(output, target)#computes loss
#We do not want to perform further operations on gradients
#We do not need the graph of derivatives to be constructed
#By analogy we have in_place chain of derivatives like in python generator

Epsilon: 0.0	Test Accuracy = 9331 / 10000 = 0.9331

Epsilon: 0.0025	Test Accuracy = 8181 / 10000 = 0.8181