Project 1: Adversarial machine learning


Tutor: Clémentine Maurice clementine.maurice AT irisa.fr

Context

An adversarial attack consists of subtly modifying an original image in such a way that the changes are almost undetectable to the human eye. The modified image is called an adversarial image, and when submitted to a classifier is misclassified, while the original one is correctly classified.

Adversarial machine learning example

Aim

Given a trained classifier and an example of generation of adversarial example that misclassifies the input, create a targeted attack where the image gets classified to a specific target class. Here is the code that you can run inside Google Colab: https://colab.research.google.com/drive/1lZMw4QejR1mhrCyM3sR_YWRQm3okwLB7

Project progression

  1. Run the given code with the trained classifier and the adversarial example generation on a few inputs.
  2. Play with the added noise and different inputs to see how the classifier behaves.
  3. Transform the misclassification attack in a targeted attack, where you can control the class of the adversarial example.

The code is there to be modified and for you to play with. Save the original version and start playing!

We’re bored

You have a few options:

Bibliography

During the final presentation, you should summarize this paper:

Plus one of the following: