Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are disabled

From: David Hildenbrand
Date: Wed Jun 25 2025 - 04:40:42 EST


On 25.06.25 10:22, Lorenzo Stoakes wrote:
On Wed, Jun 25, 2025 at 10:16:46AM +0200, David Hildenbrand wrote:
On 25.06.25 09:49, David Hildenbrand wrote:
I think the whole use case of using MADV_COLLAPSE to completely control
THP allocation in a system is otherwise pretty hard to achieve, if there
is no other way to tame THP allocation through page faults+khugepaged.

Just want to add: for an app itself, it's doable in "madvise" mode perfectly
fine.

If your app does a MADV_HUGEPAGE, it can get a THP during page-fault +
khugepaged.

If your app does not do a MADV_HUGEPAGE, it can get a THP through
MADV_COLLAPSE.

So the "madvise" mode actually works.

Right, but for me MADV_COLLAPSE is more about 'I want THPs _now_ (if available),
not when khugepaged decides to give me some'.

So we have multiple semantics at work here, unfortunately.


The problem appears as soon as we want to control other processes that might
be setting MADV_HUGEPAGE, and we actually want to control the behavior using
process_madvise(MADV_COLLAPSE), to say "well, the MADV_HUGEPAGE" should be
ignored.

This is a _very_ specialist use.

I'd argue for a 'manual' mode to be added to sysfs to cover this case, with
'never' having the 'actually means never' semantics.

You might argue that could confuse things, but it'd retain the 'de facto'
understanding nearly everybody has about what thees flags mean, but give
whatever user is out there that needs this the ability to continue doing what
they want.

And we get into philosophy about not 'breaking' userland, not sure we have a
TLB/page fault/folio allocation efficiency contract with userland :)

No program will break with this patch applied. Just potentially get performance
degradation in a very, very specialist case.


Then, you configure "never" system-wide and use
process_madvise(MADV_COLLAPSE) to drive it all manually.

Curious to learn if there is such a user out there.

Oh me too :)

I just looked at the original use cases [1], such a use case is not mentioned.

But it did add process_madvise(MADV_COLLAPSE) in 876b4a1896646cc85ec6b1fc1c9270928b7e0831 where we document

"
This is useful for the development of userspace agents that seek to
optimize THP utilization system-wide by using userspace signals to
prioritize what memory is most deserving of being THP-backed.
"

The "prioritize" might indicate that this is used in combination with "madvise", not with "never"/


So yeah, it all boils down to

(1) If there is no such use case, "never can mean never". Because there
is nothing to break, really.

(2) If there is such a use case, we might be breaking it.

[1] https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@xxxxxxxxxx/

--
Cheers,

David / dhildenb