Re: [PATCH v6 3/4] mm: huge_memory: refactor enabled_store() with change_enabled()

From: Breno Leitao

Date: Tue Mar 17 2026 - 07:24:28 EST


On Tue, Mar 17, 2026 at 09:37:39AM +0000, Lorenzo Stoakes (Oracle) wrote:
> On Mon, Mar 16, 2026 at 04:26:30PM -0700, Sohil Mehta wrote:
> > On 3/16/2026 3:12 AM, Breno Leitao wrote:
> >
> > >> I am not a mm expert and typically do not follow the mm list. Is there
> > >> an issue with the usage of non-atomic variants here? The commit message
> > >> says this uses the same pattern as set_anon_enabled_mode().
> > >>
> > >> However, set_anon_enabled_mode() has a spinlock=>huge_anon_orders_lock
> > >> protecting the access. But, transparent_hugepage_flags seems to be
> > >> unprotected in that regard.
> > >
> > > I don't think that the atomic vs non-atomic will not help much, given
> > > this is a compoud operation. Independently if this is atomic or not, it
> > > is racy with anyone changing these fields (transparent_hugepage_flags).
> > > In other words, Atomic ops make each individual bit flip safe, but
> > > set_global_enabled_mode() and defrag_store() need to flip multiple bits
> > > as a group. With atomic ops, two concurrent writers can still interleave
> > > and leave the flags in an invalid state.
> >
> > You are right it is a compound operation. So, there is an existing issue
> > with two concurrent writers which can leave the flags in an invalid state.
> >
> > But, I was wondering if there is a slightly different issue now due to
> > the non-atomic part. Some updates could be lost depending on the timing
> > of the operation.
> >
> > For example,
> >
> > CPU1 does: CPU2 does:
> > set_global_enabled_mode("always") defrag_store("always")
> >
> > ___test_and_set_bit():
> > // Trying to set bit 1
> >
> > old = *p
> > // reads flags, sees defrag bit=0
> >
> > set_bit(DEFRAG_DIRECT_FLAG)
> > // atomic: sets bit 3
> >
> >
> > *p = old | TRANSPARENT_HUGEPAGE_FLAG;
>
> > // writes back old value with bit 1 set
> > // DEFRAG_DIRECT_FLAG is lost (reverted to 0)
> >
> > IIUC, this issue didn't exist before. I think it might be safer to use
> > the atomic test_and_set_bit() to be compatible with the older code.
> > Though, I'll leave it up to you as I don't have expertise here.
>
> No, it's up to the maintainers :)
>
> Given the above I think we should switch back to the atomic accessors.
>
> We can address the broader issues with this horrible code in a separate
> series.

Ack. let me respin then.

> > Overall, as you mentioned below, protecting transparent_hugepage_flags
> > with a spinlock seems like a better, long-term solution to me as well.
>
> Yeah, let's look at doing a follow up that cleans this up in general and
> address that then.

Sure. I am planning to improve defrag_store() as the next work, and then
come up with this additional spinlock for transparent_hugepage_flags.