[PATCH v6 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files

From: Zi Yan

Date: Sun May 17 2026 - 09:57:55 EST


Hi all,

This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
file-backed THPs for FSes with large folio support (the supported orders
need to include PMD_ORDER) by default, including for writable files. It
is an in-place replacement of V5 in mm-new. It affects Mike Rapoport's
"make MM selftests more CI friendly", since "selftests/mm: khugepaged:
use kselftest framework" needs to be updated. I updated it and put it at
the end of this cover letter.

Before the patchset, the status of creating read-only THPs is below:

| PF | MADV_COLLAPSE | khugepaged |
|-----------|---------------|------------|
large folio FSes only | ✓ | x | x |
READ_ONLY_THP_FOR_FS only | x | ✓ | ✓ |
both | ✓ | ✓ | ✓ |

where READ_ONLY_THP_FOR_FS implies no large folio FSes.


Now without READ_ONLY_THP_FOR_FS:

| PF | MADV_COLLAPSE | khugepaged |
|-----------|---------------|------------|
large folio FSes (read-only fd) | ✓ | ✓ | ✓ |
large folio FSes (read-write fd) | ✓ | ✓ | ✓* |
no large folio FSes | x | x | x |

* khugepaged only collapses clean folios from writable files. Userspace
must flush dirty folios explicitly before khugepaged can collapse them.
MADV_COLLAPSE handles the flush automatically via its writeback-and-retry
path. Collapsing writable MAP_PRIVATE pagecache folios is still not
supported, since PMD THP CoW only faults in at PTE level to avoid long
CoW latency, and file_backed_vma_is_retractable() prevents it.

This means no-large-folio FSes need to add large folio support (the
supported orders need to include PMD_ORDER), so that they can leverage
file THP creation.

To prevent breaking file THP support for large folio FSes,
1. first 4 patches enable the support, so that without READ_ONLY_THP_FOR_FS,
file THP still works for large folio FSes,
2. Patch 5 removes READ_ONLY_THP_FOR_FS Kconfig,
3. patches 6-12 remove code related to READ_ONLY_THP_FOR_FS,
4. patches 13-14 enable clean pagecache folio collapse for writable files.


NOTE: collapsing writable MAP_PRIVATE pagecache folios is not supported,
since:
1. PMD THP CoW only faults in at PTE level to avoid long CoW latency,
2. the first check, due to 1, in file_backed_vma_is_retractable() prevents it.


Overview
===

1. collapse_file() checks for to-be-collapsed folio dirtiness after they
are locked and unmapped to make sure no new write happens. Before,
mapping->nr_thps and inode->i_writecount were used to cause read-only
THP truncation before a fd becomes writable.

2. hugepage_enabled() is true for anon, shmem, and file-backed cases
if the global khugepaged control is on, otherwise, khugepaged for
file-backed case is turned off and anon and shmem depend on per-size
control knobs.

3. collapse_file() from mm/khugepaged.c, instead of checking
CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
of struct address_space of the file is at least PMD_ORDER.

4. file_thp_enabled() checks mapping_max_folio_order() instead of
CONFIG_READ_ONLY_THP_FOR_FS and no longer checks if the file is opened
read-only. The dirty folio check after try_to_unmap() (Change 1)
handles writable files correctly.

5. truncate_inode_partial_folio() calls folio_split() directly instead
of the removed try_folio_split_to_order(), since large folios can
only show up on a FS with large folio support.

6. nr_thps is removed from struct address_space, since it is no longer
needed to drop all read-only THPs from a FS without large folio
support when the fd becomes writable. Its related filemap_nr_thps*()
are removed too.

7. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.

8. collapse_file() only calls filemap_flush() for read-only files.
Blindly flushing dirty folios from writable files would cause
undesirable system-wide writeback; userspace is expected to flush
explicitly, or use MADV_COLLAPSE which handles it via its retry path.

9. Updated comments and selftests in various places.


Changelog
===
>From V5[6]:
1. added mapping_min_folio_order(mapping) <= PMD_ORDER check to
mapping_pmd_folio_support() in Patch 1 to correctly handle
filesystems whose minimum folio order exceeds PMD_ORDER. Also
improved the kernel-doc comment per David's suggestions.

2. cleaned up Patch 11 per David's review: use const for open_opt and
mmap_prot, remove mmap_opt (use MAP_SHARED for both read-only and
read-write mappings), inline file_fault_common() into separate
file_fault_read() and file_fault_write() functions, fix "read only"
typo to "read-only", update usage message to "with PMD-sized large
folio support". Also fixed run_vmtests.sh to use elif test_selected
thp for the SKIP case to avoid spurious [SKIP] output per Nico's
report.

3. revised stale comment in Patch 13: removed "There won't be new dirty
pages" and updated "khugepaged only works on read-only fd" to reflect
that writable files are now supported; merged the comment blocks per
David's suggestion.

>From V4[5]:
1. fixed Patch 1's compilation error in !CONFIG_TRANSPARENT_HUGEPAGE

2. changed Patch 3 to no longer enable collapse for read-write fd but only
allowe read-only fd.

3. added two new patches to enable clean pagecache folio collapse for
writable files:
- Patch 13: remove inode_is_open_for_write() from file_thp_enabled()
so that khugepaged and MADV_COLLAPSE can process writable files.
filemap_flush() in collapse_file() is now conditionalized on the file
being read-only, to avoid repeatedly writing back dirty folios from
writable files.
- Patch 14: add read_write_file_read_ops and read_write_file_write_ops
to the khugepaged selftest to cover the new writable-file collapse paths.

>From V3[4]:
1. added a TODO comment in patch 1 noting that the is_shmem exception in
the VM_WARN_ON_ONCE() check can be removed once shmem always calls
mapping_set_large_folios() on its mapping. Used VM_WARN_ON_ONCE() in
mapping_pmd_thp_support() instead.

2. fixed the dirty folio bail-out path in patch 2: add xas_unlock_irq()
and folio_putback_lru() before the goto, which were missing and would
have left the XA lock held and the LRU isolation ref leaked.

3. renamed hugepage_pmd_enabled() to hugepage_enabled() to reflect it
controls khugepaged for all transparent hugepage types.

4. reverted the comment in hugepage_enabled() in patch 4 to the original;
only removed the phrase "when configured in," which referred to
CONFIG_READ_ONLY_THP_FOR_FS.

5. fixed commit message in patch 6: the dirty folio check is added after
try_to_unmap() in collapse_file(), not after try_to_unmap_flush().

>From V2[3]:
1. removed unnecessary check in collapse_scan_file().

2. removed inode_is_open_for_write() check in file_thp_enabled().

3. changed hugepage_enabled() to return true if khugepaged global
control is on instead of false. cleaned up anon and shmem code in the
function.

4. moved folio dirtiness check after try_to_unmap() but before
try_to_unmap_flush(), since that is sufficient to prevent new writes.

5. reordered patch 4 and 5, so that khugepaged behavior does not change
after READ_ONLY_THP_FOR_FS is removed.

6. added read-write file test in khugepaged selftest.

7. removed the read-only file restriction from guard-region selftest.

>From V1[2]:
1. removed inode_is_open_for_write() check in collapse_file(), since the
added folio dirtiness check after try_to_unmap_flush() should be
sufficient to prevent writes to candidate folios.

2. removed READ_ONLY_THP_FOR_FS check in hugepage_enabled(), please
see Patch 5 and item 2 in the overview for more details.

3. moved the patch removing READ_ONLY_THP_FOR_FS Kconfig after enabling
khugepaged and MADV_COLLAPSE to create read-only THPs.

4. added mapping_pmd_thp_support() helper function.

5. used VM_WARN_ON_ONCE() in collapse_file() for mapping eligibility check
and address alignment check instead of if + return error code. Always
allow shmem, since MADV_COLLAPSE ignore shmem huge config.

6. added mapping eligibility check in collapse_scan_file().

7. removed trailing ; for folio_split() in the !CONFIG_TRANSPARENT_HUGEPAGE.

8. simplified code in folio_check_splittable() after removing
READ_ONLY_THP_FOR_FS code.

9. clarified that read-only THP works for FSes with PMD THP support by
default.

>From RFC[1]:
1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
on by default for all FSes with large folio support and the supported
orders includes PMD_ORDER.

Suggestions and comments are welcome.

Link: https://lore.kernel.org/all/20260323190644.1714379-1-ziy@xxxxxxxxxx/ [1]
Link: https://lore.kernel.org/all/20260327014255.2058916-1-ziy@xxxxxxxxxx/ [2]
Link: https://lore.kernel.org/all/20260413192030.3275825-1-ziy@xxxxxxxxxx/ [3]
Link: https://lore.kernel.org/all/20260418024429.4055056-1-ziy@xxxxxxxxxx/ [4]
Link: https://lore.kernel.org/all/20260424024915.28758-1-ziy@xxxxxxxxxx/ [5]
Link: https://lore.kernel.org/all/20260429152924.727124-1-ziy@xxxxxxxxxx/ [6]

For Andrew to update "selftests/mm: khugepaged: use kselftest framework"
from Mike Rapoport's "make MM selftests more CI friendly" series.
===