Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

From: Vlastimil Babka
Date: Mon May 05 2025 - 06:28:56 EST


On 5/3/25 01:03, Shakeel Butt wrote:
>> > index cd81c70d144b..f8b9c7aa6771 100644
>> > --- a/mm/memcontrol.c
>> > +++ b/mm/memcontrol.c
>> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>> > {
>> > struct memcg_stock_pcp *stock;
>> > uint8_t stock_pages;
>> > - unsigned long flags;
>> > bool ret = false;
>> > int i;
>> >
>> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>> > return ret;
>> >
>> > if (gfpflags_allow_spinning(gfp_mask))
>> > - local_lock_irqsave(&memcg_stock.lock, flags);
>> > - else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
>> > + local_lock(&memcg_stock.lock);
>> > + else if (!local_trylock(&memcg_stock.lock))
>> > return ret;
>>
>> I don't think it works.
>> When there is a normal irq and something doing regular GFP_NOWAIT
>> allocation gfpflags_allow_spinning() will be true and
>> local_lock() will reenter and complain that lock->acquired is
>> already set... but only with lockdep on.
>
> Yes indeed. I dropped the first patch and didn't fix this one
> accordingly. I think the fix can be as simple as checking for
> in_task() here instead of gfp_mask. That should work for both RT and
> non-RT kernels.

These in_task() checks seem hacky to me. I think the patch 1 in v1 was the
correct way how to use the local_trylock() to avoid these.

As for the RT concerns, AFAIK RT isn't about being fast, but about being
preemptible, and the v1 approach didn't violate that - taking the slowpaths
more often shouldn't be an issue.

Let me quote Shakeel's scenario from the v1 thread:

> I didn't really think too much about PREEMPT_RT kernels as I assume
> performance is not top priority but I think I get your point. Let me

Agreed.

> explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
> lock is a spin lock which is actually a mutex but with priority
> inheritance. A task having the local lock can still get context switched

Let's say (seems implied already) this is a low prio task.

> (but will remain on same CPU run queue) and the newer task can try to

And this is a high prio task.

> acquire the memcg stock local lock. If we just do trylock, it will
> always go to the slow path but if we do local_lock() then it will sleeps
> and possibly gives its priority to the task owning the lock and possibly
> make that task to get the CPU. Later the task slept on memcg stock lock
> will wake up and go through fast path.

I think from RT latency perspective it could very much be better for the
high prio task just skip the fast path and go for the slowpath, instead of
going to sleep while boosting the low prio task to let the high prio task
use the fast path later. It's not really a fast path anymore I'd say.