Re: [PATCH] mm: add memory.compact_unevictable_allowed cgroup attribute

From: Daniil Tatianin

Date: Wed Mar 18 2026 - 05:46:30 EST

On 3/18/26 12:20 PM, Michal Hocko wrote:

On Wed 18-03-26 12:04:10, Daniil Tatianin wrote:

On 3/18/26 11:25 AM, Michal Hocko wrote:

On Tue 17-03-26 23:17:28, Daniil Tatianin wrote:

On 3/17/26 10:17 PM, Andrew Morton wrote:

On Tue, 17 Mar 2026 13:00:58 +0300 Daniil Tatianin<d-tatianin@xxxxxxxxxxxxxx> wrote:

The current global sysctl compact_unevictable_allowed is too coarse.
In environments with mixed workloads, we may want to protect specific
important cgroups from compaction to ensure their stability and
responsiveness, while allowing compaction for others.

This patch introduces a per-memcg compact_unevictable_allowed attribute.
This allows granular control over whether unevictable pages in a specific
cgroup can be compacted. The global sysctl still takes precedence if set
to disallow compaction, but this new setting allows opting out specific
cgroups.

This also adds a new ISOLATE_UNEVICTABLE_CHECK_MEMCG flag to
isolate_migratepages_block to preserve the old behavior for the
ISOLATE_UNEVICTABLE flag unconditionally used by
isolage_migratepages_range.

AI review asked questions:
https://sashiko.dev/#/patchset/20260317100058.2316997-1-d-tatianin@xxxxxxxxxxxxxx
Should this dynamically walk up the ancestor chain during evaluation to
ensure it returns false if any ancestor has disallowed compaction?

I think ultimately it's up to cgroup maintainers whether the code should do
that, but as far as I understand the whole point of cgroups is that a child
can override the settings of its parent. Moreover, this property doesn't
have CFTYPE_NS_DELEGATABLE set, so a child cgroup cannot just toggle it at
will.

In general any attributes should have proper hieararchical semantic. I
am not sure what that should be in this case. What is a desire in a
child cgroup can become fragmentation pressure to others.

I think it would be really important to explain more thoroughly about
those usecases of mixed workloads.

I think there are many examples of a system where one process is more
important than
others. For example, any sort of healthcheck or even the ssh daemon: these
may become
unresponsive during heavy compaction due to thousands of TLB invalidate IPIs
or page faulting
on pages that are being compacted. Another example is a VM that is
responsible for routing
traffic of all other VMs or even the entire cluster, you really want to
prioritize its responsiveness, while
still allowing compaction of memory for the rest of the system, for less
important VMs or services etc.

Shouldn't those use mlock?

Absolutely, mlock is required to mark a folio as unevictable. Note that unevictable folios are still
perfectly eligible for compaction. This new property makes it so a cgroup can say whether its
unevictable pages should be compacted (same as the global compact_unevictable_allowed sysctl).

Is the memcg even a suitable level of
abstraction for this tunable?

In my opinion it is, since it is relatively common to put all related tasks
into one cgroup with preset memory limits etc.

Doesn't this belong to tasks if anything?

I think it would be very difficult to implement as a per-task attribute
properly since compaction works at the folio
level. While folios have a pointer to the memcg that owns them, they may be
mapped by multiple process in case
of shared memory. We would have to find all the address spaces mapping this
folio, and then check the property on
every one of them, which may be set to different values. This may be
problematic performance-wise to do for
every physical page, and it also introduces unclear semantics if different
address spaces mapping the same page
have different opinions.

Yes, it would need to be something like an implicit mlock. I haven't
really indicated that would be a _simpler_ solution. But as this has
obvious userspace API implications the much more important question is
what is a futureproof solution. Also we need to get an answer whether
this is really needed or too niche to cast an interface maintained for
ever for.