Re: [RFC] mm, page_alloc: reintroduce page allocation stall warning

From: David Rientjes

Date: Mon Mar 23 2026 - 21:14:31 EST


On Mon, 23 Mar 2026, Vlastimil Babka (SUSE) wrote:

> On 3/22/26 4:03 AM, David Rientjes wrote:
> > Previously, we had warnings when a single page allocation took longer
> > than reasonably expected. This was introduced in commit 63f53dea0c98
> > ("mm: warn about allocations which stall for too long").
> >
> > The warning was subsequently reverted in commit 400e22499dd9 ("mm: don't
> > warn about allocations which stall for too long") but for reasons
> > unrelated to the warning itself.
> >
> > Page allocation stalls in excess of 10 seconds are always useful to debug
> > because they can result in severe userspace unresponsiveness. Adding
> > this artifact can be used to correlate with userspace going out to lunch
> > and to understand the state of memory at the time.
> >
> > There should be a reasonable expectation that this warning will never
> > trigger given it is very passive, it starts with a 10 second floor to
> > begin with. If it does trigger, this reveals an issue that should be
> > fixed: a single page allocation should never loop for more than 10
> > seconds without oom killing to make memory available.
> >
> > Unlike the original implementation, this implementation only reports
> > stalls that are at least a second longer than the longest stall reported
> > thus far.
> >
> > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
>
> I think, why not, if it's useful and we can reintroduce it without the
> issues it had.
> Maybe instead of requiring the stall time to increase by a second, we
> could just limit the stall reports to once per 10 second. If there are
> multiple ones in progress, one of them will win that report slot
> randomly. This would also cover a stall that's so long it reports itself
> multiple times (as in the original commit).
>

I like that a lot, thanks. Since part of the motivation is to correlate
userspace unresponsiveness with page allocation stalls in the kernel, we
increasingly lack that visiblity if a single long page allocation took 60
seconds a month ago, for example, and we have to reach that threshold to
report again.

The original patch ended up at line 4839 here:

4833) }
4834) }
4835)
4836) /* Caller is not willing to reclaim, we can't balance anything */
4837) if (!can_direct_reclaim)
4838) goto nopage;
4839) <===== HERE
4840) /* Avoid recursion of direct reclaim */
4841) if (current->flags & PF_MEMALLOC)
4842) goto nopage;
4843)
4844) /* Try direct reclaim and then allocating */

Which looks like the right place to put it, but probably after the
PF_MEMALLOC check.

If we set a minimum reporting threshold of 10 seconds and only report
system wide every 10 seconds, I think this will work very well. And, as
you mention, this also reports stalls for allocations that never actually
return.

I'll implement this and send out a formal patch for it.