Jascha Sohl-Dickstein - Adversarial examples transfer from machines to humans
Transcript
I'm Jascha. I'm gonna tell you that adversarial examples transfer from
image models to the human brain. Snd due to time, let's just move on.
One of my personal largest fears about AI in the medium term future is that it will allow targeted
manipulation of people. And I think as the capabilities of the AI increase,
so will the power of the manipulation that can be achieved.
Sometimes I find that I spend, like, two hours straight,
scrolling on Twitter with one finger and I'm like, what did I just do?
Twitter is a dead stupid ML algorithm rearranging pre-existing content.
I think if you were able to generate live the video, audio, and text targeted at my particular brain from my history of online interactions,
I wouldn't stand a chance. I will be as addicted or as outraged or buy whatever
brand of soda you tell me to buy... So I'm really worried about this. I think maybe adversarial examples
provide maybe a motivating example that this kind of extraordinarily targeted control is possible
of a neural system. Here, for instance, you like have a small perturbation you can add to an image which convinces a machine vision
classifier of absolutely anything you want it to believe. In this case that a bear is a truck.
But humans are not artificial neural networks. So does this actually apply to us?
Let's run an experiment. So what we're going to do is...
we actually ran a whole suite of different experimental conditions, but I'm just going to describe one in the, in the talk.
We had subjects look at a screen and we showed them two images and we said,
OK, which of these two images makes you think it's more like a cat?
And of course, neither of them is a cat but, but you still, you have to choose one of the two images.
We can try this ourselves. Raise your hand if you think the image on the left looks more like a cat.
OK. And now raise your hand if you think the image on the right looks more like a cat.
So that's actually a surprisingly effective demonstration because I would say there is about 50% more
hands raised for the image on the right.
And in fact, the image on the right has been adversarial perturbed in order to make computer
vision algorithms believe that it is a cat, while,
the image on the left has been adversarial perturbed to make computer vision algorithms think that
it is a truck. I actually am doing better on time. So I'm going to just leave these up for like five seconds.
You can like try to find differences. This is an epsilon equals two perturbation.
Let me just show you our results. The results are that in fact, subtle adversarial manipulations that work on an ensemble
of computer vision algorithms after additional like geometric augmentation transfer to humans. Here, this plot,
the X axis is the perturbation magnitude of the adversarial example. The effect gets stronger,
the larger the perturbation you allow. The image you saw was perturbation magnitude two.
The dashed line is chance performance and the Y-axis is how much we are able to bias human perception.
And you can see even in epsilon equals two, it's like a 2 to 3% bias in in human perception.
This is super cool scientifically maybe because it suggests that there are
even closer and more surprising correspondences between subtle
behaviors of artificial neural networks and the human brain. It's also maybe quite worrying
because it suggests that some of the more sci-fi strong targeted manipulations that we are able to do in order to make artificial
neural networks behave in bad ways also transfer to some degree to the
human brain and the human brain may be susceptible to similar things.
So we should worry more about manipulative superstimuli targeted at us.