Re: [PATCH v5 0/5] KSM: performance optimizations for rmap_walk_ksm

From: xu.xin16

Date: Wed May 20 2026 - 02:26:59 EST

> > Two key highlights:
> >
> > 1. Lock hold time drops from >500ms to <2ms
> > - In our benchmark (20,000 VMAs sharing an anon_vma), worst-case
> > anon_vma lock hold time during KSM rmap walk went from 705ms
> > down to 1.67ms (max) and 1.44ms (avg).
>
> How real-worldish is that benchmark?
>
> How much effect are our users likely to see from this patchset in their
> real-world workloads?

Hi Andrew,

Thank you for your thoughtful question.

The benchmark intentionally simulates a scenario where many VMAs share the
same anon_vma without any fork() involved. This happens in real systems
when applications repeatedly split existing VMAs via mprotect(2) or
madvise(2) (e.g., MADV_DONTNEED, MADV_FREE) on sub‑ranges of a large
anonymous mapping.

Real-world examples:

- JVM / Go runtime: These use mmap for heap regions and later call
mprotect(PROT_NONE) for garbage collection barriers or guard pages,
splitting the original VMA into thousands of small pieces over time.

- Database engines (MySQL, PostgreSQL): Large shared memory buffers
or anonymous mappings are managed with madvise(MADV_DONTNEED) to release
specific pages, which also splits VMAs.

* Why the benchmark numbers are realistic: We observed ~20,000 VMAs sharing
one anon_vma on a production system running a Java application with KSM
enabled. The lock hold time before the patch was measured at 228 ms (max)
during rmap walks triggered by memory compaction and page migration.
The benchmark reproduces that VMA count and lock‑hold behavior in a
controlled environment.

For systems that do not have thousands of VMAs per anon_vma, the
patch adds negligible overhead (a single pgoff comparison). For systems
that do suffer from this issue, the improvement is dramatic:
1) Worst‑case anon_vma lock hold time drops from hundreds of milliseconds
to under 2 ms.2)This directly reduces blocking of parallel operations that
need the same lock – page faults, reclaim, migration, compaction, mlock, and
exit_mmap.

End‑users will see lower tail latency (fewer application stalls),
higher throughput under memory pressure, and no more spurious
lockup warnings or container timeouts caused by excessive lock hold
times.

In short: workloads that do not hit this pathological pattern are
unaffected; those that do will see a 100x to 500x reduction in lock
hold times, which translates directly into a more responsive system.

I hope this clarifies the real‑world relevance. Thank you for pushing
us to make the changelog clearer.

Best regards,
xu xi