On Tue, May 27, 2025 at 09:52:47PM +0530, Dev Jain wrote:
On 27/05/25 4:15 pm, Lorenzo Stoakes wrote:Sure fine.
On Tue, May 27, 2025 at 01:20:49PM +0530, Dev Jain wrote:I'd rather prefer dropping the comment altogether, the function is fairly obvious : )
Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptesBut you're also making this applicable to non-contpte cases?
are painted with the contig bit, then ptep_get() will iterate through all 16
entries to collect a/d bits. Hence this optimization will result in a 16x
reduction in the number of ptep_get() calls. Next, ptep_get_and_clear()
will eventually call contpte_try_unfold() on every contig block, thus
flushing the TLB for the complete large folio range. Instead, use
get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only
do them on the starting and ending contig block.
See below, but the commit message shoud clearly point out this is general
for page table split large folios (unless I've missed something of course!
:)
Signed-off-by: Dev Jain <dev.jain@xxxxxxx>I think this comment is fairly useless, it basically spells out the function
---
mm/mremap.c | 40 +++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 0163e02e5aa8..580b41f8d169 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -170,6 +170,24 @@ static pte_t move_soft_dirty_pte(pte_t pte)
return pte;
}
+/* mremap a batch of PTEs mapping the same large folio */
name.
I'd prefer something like 'determine if a PTE contains physically contiguous
entries which map the same large folio'.
I'm not sure I follow - keep in mind there's two kinds of splitting - folio
Since I am using folio_pte_batch (and not the hypothetical pte_batch() I was+static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long addr,The code is much better however! :)
+ pte_t *ptep, pte_t pte, int max_nr)
+{
+ const fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
+ struct folio *folio;
+
+ if (max_nr == 1)
+ return 1;
+
+ folio = vm_normal_folio(vma, addr, pte);
+ if (!folio || !folio_test_large(folio))
+ return 1;
+
+ return folio_pte_batch(folio, addr, ptep, pte, max_nr, flags, NULL,
+ NULL, NULL);
+}
+Just to clarify, in the previous revision you said:
static int move_ptes(struct pagetable_move_control *pmc,
unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd)
{
@@ -177,7 +195,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
struct mm_struct *mm = vma->vm_mm;
pte_t *old_ptep, *new_ptep;
- pte_t pte;
+ pte_t old_pte, pte;
pmd_t dummy_pmdval;
spinlock_t *old_ptl, *new_ptl;
bool force_flush = false;
@@ -185,6 +203,8 @@ static int move_ptes(struct pagetable_move_control *pmc,
unsigned long new_addr = pmc->new_addr;
unsigned long old_end = old_addr + extent;
unsigned long len = old_end - old_addr;
+ int max_nr_ptes;
+ int nr_ptes;
int err = 0;
/*
@@ -236,12 +256,14 @@ static int move_ptes(struct pagetable_move_control *pmc,
flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();
- for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
- new_ptep++, new_addr += PAGE_SIZE) {
- if (pte_none(ptep_get(old_ptep)))
+ for (; old_addr < old_end; old_ptep += nr_ptes, old_addr += nr_ptes * PAGE_SIZE,
+ new_ptep += nr_ptes, new_addr += nr_ptes * PAGE_SIZE) {
+ nr_ptes = 1;
+ max_nr_ptes = (old_end - old_addr) >> PAGE_SHIFT;
+ old_pte = ptep_get(old_ptep);
+ if (pte_none(old_pte))
continue;
- pte = ptep_get_and_clear(mm, old_addr, old_ptep);
/*
* If we are remapping a valid PTE, make sure
* to flush TLB before we drop the PTL for the
@@ -253,8 +275,12 @@ static int move_ptes(struct pagetable_move_control *pmc,
* the TLB entry for the old mapping has been
* flushed.
*/
- if (pte_present(pte))
+ if (pte_present(old_pte)) {
+ nr_ptes = mremap_folio_pte_batch(vma, old_addr, old_ptep,
+ old_pte, max_nr_ptes);
force_flush = true;
+ }
+ pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0);
"Split THPs won't be batched; you can use pte_batch() (from David's refactoring)
and figure the split THP batch out, but then get_and_clear_full_ptes() will be
gathering a/d bits and smearing them across the batch, which will be incorrect."
But... this will be triggered for page table split large folio no?
So is there something wrong here or not?
saying in the other email), the batch must belong to the same folio. Since split
THP means a small folio, nr_ptes will be 1.
splitting and page table splitting.
If I invoke split_huge_pmd(), I end up with a bunch of PTEs mapping the same
large folio. The folio itself is not split, so nr_ptes surely will be equal to
something >1 here right?
I hit this in my MREMAP_RELOCATE_ANON work - where I had to take special care to
differentiate between these cases.
And the comment for folio_pte_batch() states 'Detect a PTE batch: consecutive
(present) PTEs that map consecutive pages of the same large folio.' - so I don't
see why this would not hit this case?
I may be missing something however!
pte = move_pte(pte, old_addr, new_addr);The code looks much better here after refactoring, however!
pte = move_soft_dirty_pte(pte);
@@ -267,7 +293,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
else if (is_swap_pte(pte))
pte = pte_swp_clear_uffd_wp(pte);
}
- set_pte_at(mm, new_addr, new_ptep, pte);
+ set_ptes(mm, new_addr, new_ptep, pte, nr_ptes);
}
}
--
2.30.2