[PATCH v3 15/16] userfaultfd.2: Add read-write protect mode

From: Kiryl Shutsemau

Date: Fri May 22 2026 - 10:08:01 EST


From: "Kiryl Shutsemau (Meta)" <kas@xxxxxxxxxx>

Read-write protect mode (UFFDIO_REGISTER_MODE_RWP) is supported starting
from Linux 7.2. It traps every access -- read or write -- to a present
page within a registered range. The matching UAPI consists of:

- UFFDIO_REGISTER_MODE_RWP registration-mode bit
- UFFD_FEATURE_RWP capability bit
- UFFD_FEATURE_RWP_ASYNC async (in-kernel) fault resolution
- UFFDIO_RWPROTECT install / remove RWP on a range
- UFFDIO_SET_MODE runtime sync/async toggle
- UFFD_PAGEFAULT_FLAG_RWP new pagefault.flags bit

Document the new registration-mode entry, the "Userfaultfd read-write
protect mode" section, the new pagefault flag, and a VERSIONS line.

Signed-off-by: Kiryl Shutsemau <kas@xxxxxxxxxx>
---
man2/userfaultfd.2 | 147 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 146 insertions(+), 1 deletion(-)

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index cee7c01d2512..0e702f2f4969 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -24,7 +24,7 @@
.\" the source, must acknowledge the copyright and authors of this work.
.\" %%%LICENSE_END
.\"
-.TH USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual"
+.TH USERFAULTFD 2 2026-05-22 "Linux" "Linux Programmer's Manual"
.SH NAME
userfaultfd \- create a file descriptor for handling page faults in user space
.SH SYNOPSIS
@@ -105,6 +105,28 @@ The faulted thread will be stopped from execution
until user-space write-unprotects the page using an
.B UFFDIO_WRITEPROTECT
ioctl.
+.TP
+.BR UFFDIO_REGISTER_MODE_RWP " (since Linux 7.2)"
+When registered with
+.B UFFDIO_REGISTER_MODE_RWP
+mode, user-space will receive a page-fault notification
+on any access \(em read or write \(em to a present page within the range.
+By default the faulted thread will be stopped from execution until
+user-space removes the protection using a
+.B UFFDIO_RWPROTECT
+ioctl;
+if
+.B UFFD_FEATURE_RWP_ASYNC
+was negotiated, the kernel restores access in place and the faulted
+thread continues without blocking.
+.IP
+.B UFFDIO_REGISTER_MODE_RWP
+and
+.B UFFDIO_REGISTER_MODE_WP
+cannot be combined on the same range; attempting to register with both
+bits set returns
+.BR EINVAL .
+See the "Userfaultfd read-write protect mode" section below.
.PP
Multiple modes can be enabled at the same time for the same memory range.
.PP
@@ -186,6 +208,21 @@ The user needs to resolve the page fault by unprotecting the faulted page and
kicking the faulted thread to continue.
For more information,
please refer to the "Userfaultfd write-protect mode" section.
+.PP
+Since Linux 7.2, userfaultfd can do read-write protection tracking, which
+traps every access (read or write) to a present page within a registered
+range.
+One should check against the feature bit
+.B UFFD_FEATURE_RWP
+before using this feature, and optionally negotiate
+.B UFFD_FEATURE_RWP_ASYNC
+to have the kernel auto-restore page permissions on fault without
+delivering a notification.
+This mode is intended for working-set tracking by VM memory managers and
+similar callers; cold pages can then be evicted using independent kernel
+interfaces.
+For more information,
+please refer to the "Userfaultfd read-write protect mode" section.
.\"
.SS Userfaultfd operation
After the userfaultfd object is created with
@@ -322,6 +359,98 @@ should have the flag
cleared upon the faulted page or range.
.PP
Write-protect mode supports only private anonymous memory.
+.SS Userfaultfd read-write protect mode (since Linux 7.2)
+Since Linux 7.2, userfaultfd supports read-write protect mode.
+Unlike write-protect mode, every access \(em read or write \(em to a
+protected present page generates a userfaultfd notification.
+It works on anonymous, shmem, and hugetlbfs mappings.
+.PP
+The user needs to first check availability of this feature using the
+.B UFFDIO_API
+ioctl against the feature bit
+.B UFFD_FEATURE_RWP
+before using this mode.
+On kernels or architectures that cannot support read-write protection,
+the bit is masked out from
+.I uffdio_api.features
+on return from
+.BR UFFDIO_API ;
+callers should inspect the returned features and fall back to another
+tracking mechanism when the bit is absent.
+.PP
+To register with userfaultfd read-write protect mode, the user needs to
+initiate the
+.B UFFDIO_REGISTER
+ioctl with mode
+.B UFFDIO_REGISTER_MODE_RWP
+set.
+.B UFFDIO_REGISTER_MODE_RWP
+cannot be combined with
+.BR UFFDIO_REGISTER_MODE_WP ;
+however it can be combined with
+.B UFFDIO_REGISTER_MODE_MISSING
+when the caller also wants notifications for fresh page populations.
+.PP
+After registration, the user can read-write-protect any existing memory
+within the range using the
+.B UFFDIO_RWPROTECT
+ioctl where
+.I uffdio_rwprotect.mode
+is set to
+.BR UFFDIO_RWPROTECT_MODE_RWP .
+Read-write protection only affects pages that are currently populated
+in the range; unpopulated addresses remain unpopulated and fall through
+to the normal missing-page path on first access.
+.PP
+Protection is preserved across page reclaim and migration; it is
+.I not
+preserved across operations that drop the underlying page
+.RB ( "MADV_DONTNEED " "on anonymous memory, hole-punch on shmem,"
+truncation of a file mapping).
+Callers must re-arm the range with
+.B UFFDIO_RWPROTECT
+after any such operation.
+.PP
+When an access fault happens against a protected page, user-space will
+receive a page-fault notification whose
+.I uffd_msg.pagefault.flags
+field has the
+.B UFFD_PAGEFAULT_FLAG_RWP
+bit set.
+.PP
+To resolve a read-write-protect page fault, the user initiates another
+.B UFFDIO_RWPROTECT
+ioctl whose
+.I uffdio_rwprotect.mode
+has the
+.B UFFDIO_RWPROTECT_MODE_RWP
+flag cleared.
+This restores the original VMA permissions on the affected pages and
+wakes any blocked threads (unless
+.B UFFDIO_RWPROTECT_MODE_DONTWAKE
+is also set).
+.PP
+If
+.B UFFD_FEATURE_RWP_ASYNC
+was negotiated alongside
+.BR UFFD_FEATURE_RWP ,
+the kernel resolves access faults in place without delivering a
+notification: page permissions are restored automatically and the
+faulting thread continues.
+Callers can later reconstruct which pages were touched by inspecting the
+.B PAGE_IS_ACCESSED
+bit returned by the
+.B PAGEMAP_SCAN
+ioctl described in
+.BR ioctl_userfaultfd (2)
+and
+.IR Documentation/admin\-guide/mm/pagemap.rst
+in the Linux kernel source.
+.PP
+The async mode can be toggled at runtime using the
+.B UFFDIO_SET_MODE
+ioctl, which lets a single userfaultfd switch between async detection
+and synchronous eviction without re-registering the range.
.SS Reading from the userfaultfd structure
Each
.BR read (2)
@@ -473,6 +602,12 @@ If the address is in a range that was registered with the
.B UFFDIO_REGISTER_MODE_WP
flag, when this bit is set, it means it is a write-protect fault.
Otherwise it is a page-missing fault.
+.TP
+.BR UFFD_PAGEFAULT_FLAG_RWP " (since Linux 7.2)"
+If the address is in a range that was registered with the
+.B UFFDIO_REGISTER_MODE_RWP
+flag, this bit indicates that the fault was triggered by an access to a
+read-write-protected page (either a read or a write).
.RE
.TP
.I pagefault.feat.pid
@@ -574,6 +709,16 @@ system call first appeared in Linux 4.3.
.PP
The support for hugetlbfs and shared memory areas and
non-page-fault events was added in Linux 4.11
+.PP
+Read-write protect mode
+.RB ( UFFDIO_REGISTER_MODE_RWP ", " UFFD_FEATURE_RWP ", "
+.BR UFFDIO_RWPROTECT )
+was added in Linux 7.2,
+together with
+.B UFFD_FEATURE_RWP_ASYNC
+and the
+.B UFFDIO_SET_MODE
+runtime mode toggle.
.SH CONFORMING TO
.BR userfaultfd ()
is Linux-specific and should not be used in programs intended to be
--
2.51.2