- cross-posted to:
- [email protected]
- [email protected]
- [email protected]
- linux
- cross-posted to:
- [email protected]
- [email protected]
- [email protected]
- linux
From the researcher’s blog post: (https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/)
My experiment harness executes this N times (N=100 for this particular experiment) and saves the results. […]
o3 finds the kerberos authentication vulnerability in the benchmark in 8 of the 100 runs. In another 66 of the runs o3 concludes there is no bug present in the code (false negatives), and the remaining 28 reports are false positives.
…
Combining the code for all of the handlers with the connection setup and teardown code, as well as the command handler dispatch routines, ends up at about 12k LoC (~100k input tokens), and as before I ran the experiment 100 times.o3 finds the kerberos authentication vulnerability in 1 out of 100 runs with this larger number of input tokens, so a clear drop in performance, but it does still find it. More interestingly however, in the output from the other runs I found a report for a similar, but novel, vulnerability that I did not previously know about.
A practical demonstration of “even a stopped clock is right twice a day.”
“even a stopped clock is right twice a day.”
Code analysis is a bit more complex than a clock.
Initially embarking on a manual audit of ksmbd to benchmark o3’s potential, Heelan quickly realized that the model was able to autonomously identify a complex use-after-free vulnerability in the handler for the SMB ‘logoff’ command—an issue Heelan himself had not previously detected.
Uh oh, that means AI will be used to find countless zero-days for hacking purposes.
If by countless you mean 8 valid ids of this same singular issue in 100 runs, with an almost 30% false positive rate, then sure.
I’m far more worried about the false positive rate drowning out things.