Re: [PATCH v4 3/5] ksm: add vm_pgoff into ksm_rmap_item

From: xu.xin16

Date: Sat May 16 2026 - 05:15:41 EST


> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 7d5b76478f0b..0299a53ba7c9 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -195,22 +195,28 @@ struct ksm_stable_node {
> > * @node: rb node of this rmap_item in the unstable tree
> > * @head: pointer to stable_node heading this list in the stable tree
> > * @hlist: link into hlist of rmap_items hanging off that stable_node
> > - * @age: number of scan iterations since creation
> > - * @remaining_skips: how many scans to skip
> > + * @age: number of scan iterations since creation (unstable node)
> > + * @remaining_skips: how many scans to skip (unstable node)
> > + * @vm_pgoff: vm_pgoff into the original VMA where the page is mapped (stable node)
> > */
> > struct ksm_rmap_item {
> > struct ksm_rmap_item *rmap_list;
> > union {
> > - struct anon_vma *anon_vma; /* when stable */
> > + struct anon_vma *anon_vma; /* for reverse mapping, when stable */
> > #ifdef CONFIG_NUMA
> > int nid; /* when node of unstable tree */
> > #endif
> > };
> > struct mm_struct *mm;
> > unsigned long address; /* + low bits used for flags below */
> > - unsigned int oldchecksum; /* when unstable */
> > - rmap_age_t age;
> > - rmap_age_t remaining_skips;
> > + union {
> > + struct {
> > + unsigned int oldchecksum;
> > + rmap_age_t age;
> > + rmap_age_t remaining_skips;
> > + }; /* when unstable */
> > + unsigned long vm_pgoff; /* for reverse mapping, when stable */
> > + };
> > union {
> > struct rb_node node; /* when node of unstable tree */
> > struct { /* when listed from stable tree */
> > @@ -776,6 +782,10 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
> > return vma;
> > }
> >
> > +/*
> > + * break_cow: actively break the write-protect of the VMA. This is calld when
>
> s/called/called/

Thanks.

>
> But why are we changing the documentation as part of this patch?

I want to make it more clear and readable bacause from its function name we can't tell that
implicit assumption: "it's only called when rmap_item has not yet become stable, but page has
been merged."

Based on that assumption, the following confusion was clarified.

>
> > + * rmap_item has not yet become stable, but page has been merged.
> > + */
> > static void break_cow(struct ksm_rmap_item *rmap_item)
> > {
> > struct mm_struct *mm = rmap_item->mm;
> > @@ -787,6 +797,8 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
> > * to undo, we also need to drop a reference to the anon_vma.
> > */
> > put_anon_vma(rmap_item->anon_vma);
> > + /* Reset pgoff that overlays age-related information. (still unstable) */
> > + rmap_item->vm_pgoff = 0;
> >
> > mmap_read_lock(mm);
> > vma = find_mergeable_vma(mm, addr);
> > @@ -899,6 +911,8 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
> > VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
> > stable_node->rmap_hlist_len--;
> > put_anon_vma(rmap_item->anon_vma);
> > + /* Reset pgoff that overlays age-related information. */
>
> AI review says that this might not be the case on 32bit. So I guess the comment
> should be
>
> "might overlay"

ok, thanks.

>
> > + rmap_item->vm_pgoff = 0;
> > rmap_item->address &= PAGE_MASK;
> > cond_resched();
> > }
> > @@ -1052,6 +1066,8 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
> > stable_node->rmap_hlist_len--;
> >
> > put_anon_vma(rmap_item->anon_vma);
> > + /* Reset pgoff that overlays age-related information. */
>
> Same here.
>
> > + rmap_item->vm_pgoff = 0;
> > rmap_item->head = NULL;
> > rmap_item->address &= PAGE_MASK;
> >
> > @@ -1598,8 +1614,15 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
> > /* Unstable nid is in union with stable anon_vma: remove first */
> > remove_rmap_item_from_tree(rmap_item);
> >
> > - /* Must get reference to anon_vma while still holding mmap_lock */
> > + /*
> > + * Must get reference to anon_vma while still holding mmap_lock,
> > + * We set these two members of stable node here instead of
> > + * stable_tree_append(), maybe because we don't want to hold
> > + * mmap_read_lock again? Here mmap_read_lock is already held to
> > + * find_mergeable_vma before merging.
> > + */
> > rmap_item->anon_vma = vma->anon_vma;
> > + rmap_item->vm_pgoff = vma->vm_pgoff;
>
> I suggested to use the actual linear page index instead at [1]. Storing the
> vm_pgoff is wrong I think.

Yes, you are right, I have comment the reason in last email. will correct it as you said.

>
> [1] https://lore.kernel.org/all/5401c1d2-5f42-4288-9dad-2b9768b579c7@xxxxxxxxxx/
>
>
> In another comment I said:
>
> But it's all confusing. Because we might temporarily have rmap_item->anon_vma
> set on an rmap_entry that does not yet have the STABLE_FLAG flag set through
> stable_tree_append().
>
> And then we magically call break_cow() which does a magical
>
> put_anon_vma(rmap_item->anon_vma);
>
> (this doesn't look correct in once case) ... anyhow.
>
> So we might want to reset the pgoff there as well, OR only store
> the pgoff in stable_tree_append() where we actually set STABLE_FLAG.
>
>
> So I guess we have to figure that out as well.
>

I think it should be correct in general, but that code was indeed a bit confusing.
So I've added some comments to make it as clear as possible.

The underlying idea should be this: when a page is merged (try_to_merge_with_ksm_page)
and becomes a KSM page, we should promptly store rmap_item->anon_vma
(so that rmap_walk_ksm can succeed). However, there are various subsequent reasons why
the page might fail to acquire the STABLE_FLAG—for example, a second attempt in
try_to_merge_with_ksm_page fails (see the implementation of try_to_merge_two_pages), or
stable_tree_insert fails (in cmp_and_merge_page). In such cases, we need to revert the
page back to an anonymous page, so we must explicitly call break_cow() and
put_anon_vma(rmap_item->anon_vma).

So generally, I keep the same way to store the pgoff as anon_vma.