Zac Hatfield-Dodds - Formal Verification is Overrated

Transcript

I want to be clear. I am not against formal verification. I quite like it. And I think a portfolio approach is really valuable in such a new field. But I've also seen a lot of papers that I think substantially over-claim. And so I want to make specific arguments against a couple of different Theories of Change that I've seen.

I also want to emphasize that these are my own personal views. I'm not speaking for my employer, university, community, pet turtle. I don't even have a pet turtle. This is me. And I'm basically going to go through four things. One is just like a bit of why am I giving this talk? And then: three different reasons that people promote formal verification and why I don't think any of them is a sufficient reason.

So first thing: Why me? Anyone familiar with this dragonfly logo? No one. One person. Okay. I maintained an open source project called Hypothesis, which is a property based testing library. It does randomized search, it can also do SMT backed verification of Python programs. It's used by something like a million developers worldwide, which makes it, I think, more popular than the Haskell programming language or any other such framework.

And so I've spent a lot of time, and much of a PhD, thinking about the relationship between these different things, levels of abstraction in the context of formal verification, where you have at the highest level abstraction “properties”: what are the things which should always be true or never be true?

We heard about some formal languages for specification, which let you accurately describe what is meant to be true. And importantly, a property is something that a specification might ensure. Your specification describes what your system does. Your implementation actually does it.

And then the thing we care about is the base level reality. And the gap between each of these is a substantial difficulty for all kinds of formal methods. “Hypothesis”, my library, basically goes directly between properties and implementation. We just skip the specification entirely and let people write something that looks more like a traditional unit test.

Okay, reason number one that I think formal methods are overrated for AI: model weights are just completely intractable to reason about. Even a very small model of only a few billion floating point parameters is just wildly larger than formal verification tools can usefully verify.

If you make the simplifying assumption that floating point numbers are basically real numbers, your conclusions will be completely irrelevant because they are not, in fact, real. Has anyone here ever trained or used an ML model where they had a numerical bug?

Yeah, a few people? Okay. It does happen. The second problem is that reality is not just complicated; it's also partially unknown. Sometimes we get this perspective: The thing we want to model is not exact or verifiable. it's not exactly “what does the model do” or “what are the bounds of the model's output”. Token probabilities are probably between 0 and 1, for example. But rather that we want to give some kind of higher level proposal.

The paper Towards Guaranteed Safe AI describes this as: you have a world model; you have a safety spec; and then you have a verifier that can prove that your model is safe according to your safety spec and your world model.

But my problem there is that you haven't actually solved the problem, you've just moved it around a bit, right? You now have to convince me that the world model sufficiently accurately matches the real world, and that your verification is sound.

And I think at that point, I would rather you spend your time doing threat modelling directly, and say, “What are the things you're concerned about? Why do we think they would or wouldn't be a problem?”

Also on this, a paper by Max Tegmark and Steve Omohundro which proposed that DNA synthesis machines – which some people are concerned about for bio risk – should be required to formally verify that the DNA they're synthesizing is safe for humans. This is like a thousand years of progress in microbiology being asked for here.

We don't even know what all the DNA in humans is. Like 40 percent of the RNA is just unknown. And recently there was a thing where we discovered: oh, some of it self-assembles into rectangles in the gut, and we don't really have any idea what it is doing. Also you have to do quantum chemistry for this. It is just so wildly past the state of the art, I have trouble describing it.

And the final category of proposals that I see under this sort of formal verification approach, is that we should use AI to develop tools and software which are formally verified. And I am broadly a fan of this, but it's not actually a substitute for working on AI safety. If you say we should do this instead of having generally capable AI, then I think there are two problems.

One is just that it's uncompetitive. I just don't think that developing explicit software will outperform machine learning at the things machine learning is best at. Where it does: awesome, let's use the software. But I think there will be a residual, and we have to work out how we deal with the risks there.

And on the other side, I think Tool AI is basically an unstable proposal. The moment you release a tool, somebody else will ask the tool “Hey, Tool AI, what action would be best to take? Express that as a sequence of API calls for this robot that I've set up for you”. And so the addition of a trivial for-loop is the only difference between a tool and an agent, in a general sense.

And basically, verification is not going to help for the following reasons. I think it's an important part of our portfolio, but I think it's important to avoid over-claiming about how much it could solve our problems.

Zac Hatfield-Dodds - Formal Verification is Overrated

Transcript

Alignment Workshop