Google Engineers Launch "Sashiko" for Agentic AI Code Review of the Linux Kernel

105 points

1/21/1970

3 days ago

by speckx

Comments

rwmj

Better to link to the site itself, or one of the reviews?

For an example of a review (picked pretty much at random) see: https://sashiko.dev/#/patchset/20260318151256.2590375-1-andr...

The original patch series corresponding to that is: https://lkml.org/lkml/2026/3/18/1600

Edit: Here's a simpler and better example of a review: https://sashiko.dev/#/patchset/20260318110848.2779003-1-liju...

I'm very glad they're not spamming the mailing list.

3 days ago

jeffbee

That is both really useful and a great example of why they should have stopped writing code in C decades ago. So many kernel bugs have arisen from people adding early returns without thinking about the cleanup functions, a problem that many other language platforms handle automatically on scope exit.

3 days ago

KurSix

You don't even need an LLM for this stuff. GCC has the __cleanup__ attribute, and kernel static analyzers like Smatch have been catching missing unlocks for a decade now. People just ignore linter warnings when submitting patches, so the language itself isn't really the issue. The LLM is basically just acting as a talking linter that can explain the error in plain English

2 days ago

jeffbee

Linux doesn't have any of: sufficient testing, sufficient static analysis, or sufficient pre-commit code review. Under those conditions, which I take as a given because it's their project and we can't just swap out the leaders with more tasteful leaders, adding this type of third-party review feedback strikes me as valuable. Perhaps, to your point, it would also be possible to simply run static analyzers on new proposed commits.

2 days ago

overfeed

Must we do this on every thread about the Linux kernel?

3 days ago

RobRivera

The beatings will continue until morale improves

2 days ago

vpShane

yeah but Linux is love, linux is life. if you really want to get the beatings going:

Rust > C and GNU/Linux should be Rust.

2 days ago

Ferret7446

Ironically C is safer than Rust (if you compile it with Fil-C)

2 days ago

ugh123

also vim > emacs

2 days ago

richwater

[flagged]

3 days ago

nurettin

> stopped writing code in C decades ago.

And what were they supposed to use in 2006? Free Pascal? Ada?

2 days ago

greenavocado

Someone suggested C++ and you should see the response from Linus

https://harmful.cat-v.org/software/c++/linus

2 days ago

nurettin

Of course I specifically avoided invoking that language's name within the context of kernel programming in fear of summoning a Linus.

And he's so right. I didn't think like that back then, but new/delete (which have to be overloaded for kernel) behind allocators behind containers, vtables, =0, uninitialized members, unhandled ctor errors, template magic, "sometimes rvo", compiler hints, "sometimes reinterpret cast", 3rd party libraries, it would have been a disaster 20 years ago. Now he's being nice to Rust partially to spite that lang I love some more.

2 days ago

tigen

This ought to help with that. https://thephd.dev/c2y-the-defer-technical-specification-its...

3 days ago

TacticalCoder

Looks like a great new tool to help ship less bugs!

Nitpicking on this though:

> "In my measurement, Sashiko was able to find 53% of bugs based on a completely unfiltered set of 1000 recent upstream issues based on "Fixes:" tags (using Gemini 3.1 Pro). Some might say that 53% is not that impressive, but 100% of these issues were missed by human reviewers."

That'd assume 100% of the issues that were fixed and used for training were not fixed following a human review. I don't buy it: it's extremely common to have a dev notice a bug in the code, without a user having ever reported the bug.

I think the wording meant to say: "... but 100% of these issues were first missed by humans".

My point being: the original code review by a human ain't the only code review by a human. Or put it this way: it's not as if we were writing code, shipping it, then never ever looking at that line of code again unless a bug report were to come out. It's not how development works.

2 days ago

withinrafael

Looks cool, but this site is a bit difficult for me to grok.

I think the table might be slightly inside-out? The Status column appears to show internal pipeline states ("Pending", "In Review") that really only matter to the system, while Findings are buried in the column on the far right. For example, one reviewed patchset with a critical and a high finding is just causally hanging out below the fold. I couldn't immediately find a way to filter or search for severe findings.

It might help to separate unreviewed patches from reviewed ones, and somehow wire the findings into the visual hierarchy better. Or perhaps I'm just off base and this is targeting a very specific Linux kernel community workflow/mindset.

Just my 1c.

3 days ago

tonfa

I think it's just a dashboard, not meant to be used as is.

Reviewers are more likely to instead subscribe to get the review inline, and then potentially incorporate that with their feedback.

3 days ago

fdghrtbrt

[flagged]

3 days ago

[deleted]

2 days ago

kleiba

> Sashiko was able to find around 53% of bugs

That's cool. Another interesting metric, however, would be the false positive ratio: like, I could just build a bogus system that simply marks everything as a bug and then claim "my system found 100% of all bugs!"

In practice, not just the recall of a bug finding system is important but also its precision: if human reviewers get spammed with piles of alleged bug reports by something like Sashiko, most of which turn out not to be bugs at all, that noise binds resources and could undermine trust in the usefulness of the system.

3 days ago

i_cannot_hack

They mention false positives as well on github: The rate of false positives is harder to measure, but based on limited manual reviews it's well within 20% range and the majority of it is a gray zone.

2 days ago

riteshkew1001

That 20% figure is actually better than it sounds. Coverity on kernel-scale C codebases typically lands in the 40-60% false positive range... "not wrong but not the bug you'd prioritize" is different from a true false positive.

2 days ago

kleiba

Hard to measure, how? Either something is a bug or not - otherwise how would you be able to count anything at all?

2 days ago

lstodd

Assign each line of code a bugginess factor then count those exceeding an arbitrary threshold obviously.

2 days ago

ChrisArchitect

https://github.com/sashiko-dev/sashiko (https://news.ycombinator.com/item?id=47427996)

3 days ago

monksy

I think this is a great and interesting project. However, I hope that they're not doing this to submit patches to the kernel. It would be much better to layer in additional tests to exploit bugs and defects for verification of existance/fixes.

(Also tests can be focused per defect.. which prevents overload)

From some of the changes I'm seeing: This looks like it's doing style and structure changes, which for a codebase this size is going to add drag to existing development. (I'm supportive of cleanups.. but done on an automated basis is a bad idea)

I.e. https://sashiko.dev/#/message/20260318170604.10254-1-erdemhu...

3 days ago

rwmj

No, it's reviewing patches posted on LKML and offering suggestions. The original patch posted corresponding to your link was this, which was (presumably!) written by a human:

https://lkml.org/lkml/2026/3/9/1631

3 days ago

bjackman

Style and structure is not the goal here, the reason people are interested in it is to find bugs.

Having said that, if it can save maintainers time it could be useful. It's worth slowing contribution down if it lets maintainers get more reviews done, since the kernel is bottlenecked much more on maintainer time than on contributor energy.

My experience with using the prototype is that it very rarely comments with "opinions" it only identifies functional issues. So when you get false positives it's usually of the form "the model doesn't understand the code" or "the model doesn't understand the context" rather than "I'm getting spammed with pointless advice about C programming preferences". This may be a subsystem-specific thing, as different areas of the codebase have different prompts. (May also be that my coding style happens to align with its "preferences").

3 days ago

throwa356262

I find it interesting that this is written in Rust (not golang) and co-authored with Claude (not gemini)

2 days ago

dgacmu

It's written in rust, but why do you believe it was co-authored with Claude? The README in github specifically says:

> This project was built using Gemini CLI

https://github.com/sashiko-dev/sashiko

2 days ago

adampunk

Claude snitches on you in your commits. You can just look at the history.

2 days ago

gcommer

Only two commits have 'Co-Authored-By: Claude' and they're both PR contributions from a non-google email.

2 days ago

Havoc

How do the kernel devs feel about this? Cause that seems to be the sticking point for external AI “help” - the open source devs hate it

Seems to be a well funded effort though so maybe it’s better?

2 days ago

simianwords

> Roman reports that Sashiko was able to find around 53% of bugs based on an unfiltered set of 1,000 recent upstream Linux kernel issues with "Fixes: " tag

What does this mean?

2 days ago

spiderfarmer

47% of recent bugs would go unnoticed if we relied solely on this tool. But we might find more, and faster.

2 days ago

mika-el

the separation between who writes and who reviews is the whole thing. I do same at smaller scale — one model writes code, different model reviews it. self-review misses things, same reason you don't review your own PRs

2 days ago

bmd1905

[dead]

2 days ago

takahitoyoneda

[dead]

3 days ago

whiteclawonso36

[dead]

2 days ago

balinha_8864

[dead]

2 days ago

Heer_J

[dead]

3 days ago

ratrace

[dead]

3 days ago

michaelchen58

[flagged]

2 days ago

goatyishere25

[flagged]

2 days ago

quantium1628

[flagged]

3 days ago

4fterd4rk

oh god can we not

3 days ago

smlacy

What's your concern?

3 days ago

htx80nerd

Have you ever programmed with AI? It needs a lot of hand holding for even simple things sometimes. Forgets basic input, does all kinds of brain dead stuff it should know not to do.

>"good catch - thanks for pointing that out"

3 days ago

lame-robot-hoax

Can you clarify how, at all, that’s relevant to the article?

3 days ago

ablob

Both the curl and the SQLite project have been overburdened by AI bug reports. Unless the Google engineers take great care to review each potential bug for validity the same fate might apply here. There have been a lot of news regarding open source projects being stuffed to the brim with low effort and high cost merge requests or issues. You just don't see all the work that is caused unless you have to deal with the fallout...

3 days ago

tonfa

This project has nothing to do with bug reports... it's an opt-in tool for reviewing proposed changes that kernel developers can decide to use (if they find it useful).

3 days ago

jamesnorden

Well, if it doesn't find anything it's just a waste of time at best.

3 days ago

danielbln

Prevention paradox.

2 days ago

asadm

i think it's a skill.

3 days ago

__tidu

well tbf code review is probably the most useful part of "AI coding", if it catches even a single bug you missed its worth it, plus false positives would waste dev time but not pollute the kernel

3 days ago

qainsights

They would have completely redesigned Google Gerrit.

3 days ago

KurSix

Written in Rust, tests a C kernel, using the Google Gemini API... classic 2026. I'd bet 90% of the actual useful work this agent does is just dumb pattern-matching for typical vulnerabilities (use-after-free, uninitialized vars) that the model memorized straight out of the CVE database

2 days ago

nasretdinov

Arguably that's still very useful :)

2 days ago

shevy-java

Now they want to kill the Linux kernel. :(

We've already seen how bug bounty projects were closed by AI spam; I think it was curl? Or some other project I don't remember right now.

I think AI tools should be required, by law, to verify that what they report is actually a true bug rather than some hypothetical, hallucinated context-dependent not-quite-a-real-bug bug.

3 days ago

tonfa

It's not forced upon anyone, it's a tool that patch authors or reviewers can use if they want to.

3 days ago

KurSix

Those incidents with curl and sqlite were caused by a mob of script kiddies dumping source code into ChatGPT and spamming bug bounties for quick cash. This tool is built by actual Google devs who are active on the kernel mailing list. They know perfectly well that if they start spamming LKML with hallucinations, they'll get patch acceptance blocked for their entire corporation

2 days ago