Re: [PATCH v2 0/9] mm/huge_memory: refactor zap_huge_pmd()

From: Roman Gushchin

Date: Mon Mar 23 2026 - 21:08:48 EST

"Lorenzo Stoakes (Oracle)" <ljs@xxxxxxxxxx> writes:

> On Sat, Mar 21, 2026 at 05:15:30PM -0700, Andrew Morton wrote:
>> On Fri, 20 Mar 2026 20:33:11 -0700 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> > A lot of patchsets are "failed to apply". What is Sashiko trying to
>> > apply MM patches to? It would take some smarts to apply the v2
>> > patchset when v1 is presently in mm.git?
>>
>> ?
>>
>> The way things are going at present, I'm just not going to apply a
>
> 50% noise vs. signal?... maybe wait until we're in the 9x'%s?
>
>> series which Sashiko "failed to apply". And that's cool, I'll just
>> wait for a version which Sashiko was able to apply. And then not
>> apply unless all Sashiko questions are resolved or convincingly refuted.
>
> Andrew, for crying out loud. Please don't do this.
>
> 2 of the 3 series I respan on Friday, working a 13 hour day to do so, don't
> apply to Sashiko, but do apply to the mm tree.

I'll look into that.

> I haven't the _faintest clue_ how we are supposed to factor a 3rd party
> experimental website applying or not applying series into our work??
>
> And 'not apply unless all Sashiko questions are resolved or convincingly
> refuted.' is seriously concerning.
>
> The workload is already insane, now you're expecting us to answer every bit
> of nonsense Sashiko hallucinates or misunderstands also?
>
> I say that with no disrespect to Roman or his efforts, but as discussed at
> length, it is not ready for prime time yet.
>
> It's clear that Sashiko is not correctly handling applies, and produces a
> lot of noise. Predicating taking series on this is absurd.

Not trying to pretend that Sashiko is perfect in any way, I think a good
mental exercise is to put down our expectation how the "perfect" system
would work. The more I work on it, the more I realize it it's far from
binary correct/incorrect. In fact, the same applies to humans: I'm sure
everyone of us had once this feeling that someone is to picky and just
annoying us with finding small nits. At the same time some of these
people are extremely useful for the community to find and fix a lot of
issues. In the end, we do argue all the time about questions/issues
raised by human reviewers.

Like do we prefer a system, which finds more real bugs at the cost of being
more noisy or we prefer a system which misses more but if it points at
the bug, it's certainly real? I'm sure you tempted to prefer the latter,
but image a hypothetical system which finds _all_ bugs, but has some false
positive rate, e.g. 20%. I think it's pretty attractive.

Also lot of raised issues are real, but subjectively are not worth our
time. But this is extremely subjective! Depends on the personal level
of perfectionism, amount of time available, the state of code before,
further plans, etc etc. For example, syzkaller has usually o(100's) open
bugs, which are 100% real, but not always are high priority work.

I think that asking to address 100% issues raised by any LLM is not
reasonable (especially because it's output might be different each time
you runt it with the same input), but I also think it's reasonable to
address critical & high severity concerns. And I'm happy to tweak
Sashiko to be more conservative here, but I think it should be based on
some specific examples or data, not purely subjective.

tl;dr I increasingly realize the importance of the social context for
providing good reviews, and it can't be easily derived from the code.
What is acceptable in one subsystem is considered a bad practice in the
other. I guess the only way to get the system we all find acceptable
(and we still might not like it, who likes being pointed at their bugs?)
is collectively codify our expectations in prompts on per-subsystem basis.

Thanks!