Re: [PATCH v6 3/4] mm: huge_memory: refactor enabled_store() with change_enabled()

From: Lorenzo Stoakes (Oracle)

Date: Tue Mar 17 2026 - 05:42:15 EST


On Mon, Mar 16, 2026 at 04:26:30PM -0700, Sohil Mehta wrote:
> On 3/16/2026 3:12 AM, Breno Leitao wrote:
>
> >> I am not a mm expert and typically do not follow the mm list. Is there
> >> an issue with the usage of non-atomic variants here? The commit message
> >> says this uses the same pattern as set_anon_enabled_mode().
> >>
> >> However, set_anon_enabled_mode() has a spinlock=>huge_anon_orders_lock
> >> protecting the access. But, transparent_hugepage_flags seems to be
> >> unprotected in that regard.
> >
> > I don't think that the atomic vs non-atomic will not help much, given
> > this is a compoud operation. Independently if this is atomic or not, it
> > is racy with anyone changing these fields (transparent_hugepage_flags).
> > In other words, Atomic ops make each individual bit flip safe, but
> > set_global_enabled_mode() and defrag_store() need to flip multiple bits
> > as a group. With atomic ops, two concurrent writers can still interleave
> > and leave the flags in an invalid state.
>
> You are right it is a compound operation. So, there is an existing issue
> with two concurrent writers which can leave the flags in an invalid state.
>
> But, I was wondering if there is a slightly different issue now due to
> the non-atomic part. Some updates could be lost depending on the timing
> of the operation.
>
> For example,
>
> CPU1 does: CPU2 does:
> set_global_enabled_mode("always") defrag_store("always")
>
> ___test_and_set_bit():
> // Trying to set bit 1
>
> old = *p
> // reads flags, sees defrag bit=0
>
> set_bit(DEFRAG_DIRECT_FLAG)
> // atomic: sets bit 3
>
>
> *p = old | TRANSPARENT_HUGEPAGE_FLAG;

> // writes back old value with bit 1 set
> // DEFRAG_DIRECT_FLAG is lost (reverted to 0)
>
> IIUC, this issue didn't exist before. I think it might be safer to use
> the atomic test_and_set_bit() to be compatible with the older code.
> Though, I'll leave it up to you as I don't have expertise here.

No, it's up to the maintainers :)

Given the above I think we should switch back to the atomic accessors.

We can address the broader issues with this horrible code in a separate
series.

>
> Overall, as you mentioned below, protecting transparent_hugepage_flags
> with a spinlock seems like a better, long-term solution to me as well.

Yeah, let's look at doing a follow up that cleans this up in general and
address that then.

>
> >
> > That said, Although I don't think this patch is making it worse, I think
> > the is a racy issue here that we can make better.
> >
> > My suggestion is to move the rest of the helpers (defrag_store()) to use
> > sysfs_match_string(), and then create a thp_flags_lock spinlock to
> > protect operations against transparent_hugepage_flags. Any concern about
> > this approach?
> >
>
>

Cheers, Lorenzo