The zero-days are numbered

52 points

1/21/1970

a day ago

by mccr8

Comments

branko_d

This is just a footnote in the article, but is incredibly important, IMO:

”There’s a risk that codebases begin to surpass human comprehension as a result of more AI in the development process, scaling bug complexity along with (or perhaps faster than) discovery capability. Human-comprehensibility is an essential property to maintain, especially in critical software like browsers and operating systems.”

This aligns with my own experience, and I believe experience of most practitioners in the field: writing a piece of code is just a beginning of a very long journey.

We should be careful about optimizing that first step at the expense of the journey.

9 hours ago

staticassertion

So, my perspective on this is that the blog post doesn't motivate me very much.

First of all, the constant framing around Mythos as being capable of tackling "hardened" targets is invalid to me. Or it heavily depends on what "hardened" means. Firefox is an old codebase that has rapidly adopted new features over decades, much of its security has followed after implementation (JIT -> JIT hardening, single process -> multiprocess, etc), it's written in C++, it's extremely oriented towards performance and benchmarking, it has massive attack surface, etc. They have an incredible team working on it - genuinely best in class people are working to make Firefox safer. But does that mean Firefox is a hard target? That's not really my view of it, it's just more expensive than, say, phishing. I would maybe contrast this against Firecracker, which I think is a hard target despite having a tiny fraction of the investment of Firefox.

Similarly, Linux is a codebase with plenty of security investment - lots of research goes into Linux security. It is also an extraordinarily soft target in my opinion. Firefox is far better than Linux, I think, but the point is that Mythos has picked a lot of very specific targets here, called them "hardened", and I think it's misleading from Anthropic's marketing - they frame things like "heap spray" as advanced techniques and that's just nonsense.

Second, this article seems to be trying to convey a few things and then it has a really intense conclusion. I want to separate these out.

The article wants to convey that one of the advantages that attackers have is that the attacker's human-based attack exploration has been more effective than the difficult to scale automation-based attack exploration employed by defenders. The argument is that new AI capabilities change this because automation-based attack exploration via AI will close that gap.

One conclusion is then that because this gap will close that defenders can "win", which presumably means that attacks become too expensive (and therefor attackers move to other paths).

The final conclusion is seemingly that Firefox will have 0 exploitable vulnerabilities.

I want to separate that final conclusion out and focus on the first part.

1. The argument seems to rely on the advantage attackers have being singular. Attackers have a lot of advantages, so I'm not sure that removing this one will be sufficient.

2. It seems incorrect to me that AI will be so radically effective at finding vulnerabilities that attackers won't be able to find enough to build a chain. In fact, I think this is almost certainly false for a codebase like Firefox. It would be one thing if Firefox froze it's codebase today and spend years on hardening what's there, maybe I'd buy that... but no, I don't buy it at all for Firefox as it exists in reality.

3. This all relies on Firefox reducing the bug density such that viable chains can not be built. There is, I think, some threshold in which a codebase's bug density is so low that full attack chains are not viable. I do not think Firefox will ever reach that low of a threshold. The claim here seems to be that bugs can be found so quickly that the threshold can be met, or it rejects my threshold idea entirely.

I don't think any of this makes sense to me, personally. I don't think we've ever seen a moving codebase made significantly safer via any "bug squashing" technique. The value of squashing bugs is to track where bugs crop up so that you are informed about what mitigations should be built; you see a lot of vulnerabilities that leverage JIT RWX? Time to harden how JIT pages are emitted. Any bug isn't the interesting part, it's the mitigations and layered defenses that help. Fixing 100 XSS vulns on a website will never be as good as deploying a CSP, or updating your CSP, etc. This has always been the case, I don't think AI is so radically different that it will change this.

So anyway, I sort of don't buy anything the blog post says so far. And then it ends on this note:

> The defects are finite, and we are entering a world where we can finally find them all.

As far as I'm concerned, the defects in Firefox should be considered ~roughly infinite. But even if we say "no, there's some finite number of them", the idea that we'll drive to zero is just... not something I'm going to take seriously.

Earlier in the article "Nevertheless, we’ve all long quietly acknowledged that bringing exploits to zero was an unrealistic goal." is stated so it does seem to me that whoever wrote this actually believes that a zero vulnerability firefox is achievable. I think that it's not even achievable to reach a low enough threshold to break entire chains of exploits, so obviously I think it's not correct to say zero is achievable.

I think I can probably justify this to some degree.

1. We've never seen any bug squashing technology, including insanely highly leveraged ones like fuzzing, meaningfully reduce exploitability in similar projects. I doubt anyone thinks that 0 days are rare ITW because of fuzzing when it's very obviously because of sandboxing and mitigations. Is that contentious? Feel free to push back.

2. Rice's Theorem makes it seem highly implausible that we can reach a true "zero point" computationally through formal means, which to me implies that AI would have to be so effective at exploring insanely massive state spaces for a moving target for so many different properties, that it just doesn't sound realistic to me.

I am extremely skeptical of a lot of the statements made in this post. I do not think Mythos will help defenders by squashing bugs at all in a codebase like Firefox, I do not think Firefox is safer just because they patched 500 vulnerabilities, I do not think Firefox will meaningfully reduce vulnerability counts long term with Mythos, and I do not think that this should change their strategy of using fundamentally safer technologies / implementing mitigation techniques.

I will weakly predict that AI usage for bug squashing will look quite a lot like fuzzing. A win in the short term, just another automated tool in the long term, and the highest impact will be watching it for trends. I think AI will likely fare worse than fuzzing though.

5 hours ago

moyix

On hardened targets and Firecracker specifically, here's a recent vulnerability found by "Anthropic": https://aws.amazon.com/security/security-bulletins/2026-015-...

Unfortunately it's unclear whether it was Mythos, an earlier model, or even an eagle-eyed employee.

I tend to agree that bug squashing your way to perfectly secure software is unlikely, but there are plenty of projects that managed to fuzz/test/audit their way to making it much harder to find serious vulnerabilities. If we can do the same again with LLMs in a way that leaves the remaining vulnerabilities out of reach of anyone except extremely skilled humans (perhaps with LLM assistance) then that's still an OK outcome that buys us time to build stronger foundations.

3 hours ago

staticassertion

> On hardened targets and Firecracker specifically, here's a recent vulnerability found by "Anthropic": https://aws.amazon.com/security/security-bulletins/2026-015-...

Yep. It's notable that they failed to exploit it.

> but there are plenty of projects that managed to fuzz/test/audit their way to making it much harder to find serious vulnerabilities

Agreed! But I think those projects have certain things in common, like being tightly scoped, slowly developed, and built with safety in mind from day 1.

I don't think that any of the projects that have managed to meaningfully improve safety through fuzzing have the same qualities as projects like Firefox, Linux, etc.

3 hours ago

Analemma_

“Opus 4.6 found 22 security bugs, Mythos found 271 on an initial evaluation” sure seems to refute the grumbling I’ve seen from a couple OAI people on Twitter that Mythos isn’t actually anything special and everything it finds could be found by earlier models too.

a day ago

jruohonen

They also put this in the end in boldfaced:

"Encouragingly, we also haven’t seen any bugs that couldn’t have been found by an elite human researcher."

But, in overall, I think it was a well-written positive take (instead of the fear-mongering party line).

18 hours ago