Re: [PATCH v4] mm/mglru: fix cgroup OOM during MGLRU state switching

From: Leno Hou

Date: Wed Mar 18 2026 - 04:26:36 EST

On 3/18/26 3:16 PM, Barry Song wrote:

On Wed, Mar 18, 2026 at 11:29 AM Leno Hou via B4 Relay
<devnull+lenohou.gmail.com@xxxxxxxxxx> wrote:

From: Leno Hou <lenohou@xxxxxxxxx>

[...]

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index ad50688d89db..1f6b19bf365b 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -102,6 +102,12 @@ static __always_inline enum lru_list folio_lru_list(const struct folio *folio)

#ifdef CONFIG_LRU_GEN

+static inline bool lru_gen_draining(void)
+{
+ DECLARE_STATIC_KEY_FALSE(lru_drain_core);
+
+ return static_branch_unlikely(&lru_drain_core);
+}

Can we name it lru_gen_switch() or lru_switch?
Since “drain” implies disabling MGLRU, the operation
could just as well be enabling it. Also, can we drop
the _core suffix?

OK. Next V5 patch will be:

+static inline bool lru_gen_switching(void)
+{
+ DECLARE_STATIC_KEY_FALSE(lru_switch);
+
+ return static_branch_unlikely(&lru_switch);
+}

#ifdef CONFIG_LRU_GEN_ENABLED
static inline bool lru_gen_enabled(void)
{
@@ -316,6 +322,11 @@ static inline bool lru_gen_enabled(void)
return false;
}

+static inline bool lru_gen_draining(void)

lru_gen_switching()? >

+{
+ return false;
+}
+
static inline bool lru_gen_in_fault(void)
{
return false;
diff --git a/mm/rmap.c b/mm/rmap.c
index 6398d7eef393..0b5f663f3062 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -966,7 +966,7 @@ static bool folio_referenced_one(struct folio *folio,
nr = folio_pte_batch(folio, pvmw.pte, pteval, max_nr);
}

OK. I'll be add following ducumentation that just you said.
/* When LRU is switching, we don’t know where the surrounding folios
are. —they could be on active/inactive lists or on MGLRU. So the simplest approach is to disable this look-around optimization.
*/

- if (lru_gen_enabled() && pvmw.pte) {
+ if (lru_gen_enabled() && !lru_gen_draining() && pvmw.pte) {

Ack. When LRU is switching, we don’t know where the
surrounding folios are—they could be on active/inactive
lists or on MGLRU. So the simplest approach is to
disable this look-around optimization.
But please add a comment here explaining it.

if (lru_gen_look_around(&pvmw, nr))
referenced++;
} else if (pvmw.pte) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33287ba4a500..88b9db06e331 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -886,7 +886,7 @@ static enum folio_references folio_check_references(struct folio *folio,
if (referenced_ptes == -1)
return FOLIOREF_KEEP;

- if (lru_gen_enabled()) {

documentation as following:

/*
* During the MGLRU state transition (lru_gen_switching), we force
* folios to follow the traditional active/inactive reference checking.
*
* While MGLRU is switching,the generational state of folios is in flux.
* Falling back to the traditional logic (which relies on PG_referenced/
* PG_active flags that are consistent across both mechanisms) provides
* a stable, safe behavior for the folio until it is fully migrated back
* to the traditional LRU lists. This avoids relying on potentially
* inconsistent MGLRU generational metadata during the transition.
*/

+ if (lru_gen_enabled() && !lru_gen_draining()) {

I’m curious what prompted you to do this.

This feels a bit odd. I assume this effectively makes
folios on MGLRU, as well as those on active/inactive
lists, always follow the active/inactive logic.

It might be fine, but it needs thorough documentation here.

another approach would be:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33287ba4a500..91b60664b652 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -122,6 +122,9 @@ struct scan_control {
/* Proactive reclaim invoked by userspace */
unsigned int proactive:1;

+ /* Are we reclaiming from MGLRU */
+ unsigned int lru_gen:1;
+
/*
* Cgroup memory below memory.low is protected as long as we
* don't threaten to OOM. If any cgroup is reclaimed at
@@ -886,7 +889,7 @@ static enum folio_references
folio_check_references(struct folio *folio,
if (referenced_ptes == -1)
return FOLIOREF_KEEP;

- if (lru_gen_enabled()) {
+ if (sc->lru_gen) {
if (!referenced_ptes)
return FOLIOREF_RECLAIM;

This makes the logic perfectly correct (you know exactly
where your folios come from), but I’m not sure it’s worth it.

Anyway, I’d like to understand why you always need to
use the active/inactive logic even for folios from MGLRU.
To me, it seems to work only by coincidence, which isn’t good.

Thanks
Barry

Hi Barry,

I agree that using !lru_gen_draining() feels a bit like a fallback path. However, after considering your suggestion for sc->lru_gen, I’m concerned about the broad impact of modifying struct scan_control.Since lru_drain_core is a very transient state, I prefer a localized fix that doesn't propagate architectural changes throughout the entire reclaim stack.

You mentioned that using the active/inactive logic feels like it works by 'coincidence'. To clarify, this is an intentional fallback: because the generational metadata in MGLRU becomes unreliable during draining, we intentionally downgrade these folios to the traditional logic. Since the PG_referenced and PG_active bits are maintained by the core VM and are consistent regardless of whether MGLRU is active, this fallback is technically sound and robust.

I have added detailed documentation to the code to explain this design choice, clarifying that it's a deliberate transition strategy rather than a coincidence."

Best Regards,
Leno Hou