Re: [PATCH v4 3/5] ksm: add vm_pgoff into ksm_rmap_item
From: David Hildenbrand (Arm)
Date: Mon May 18 2026 - 04:21:57 EST
On 5/16/26 11:15, xu.xin16@xxxxxxxxxx wrote:
>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>> index 7d5b76478f0b..0299a53ba7c9 100644
>>> --- a/mm/ksm.c
>>> +++ b/mm/ksm.c
>>> @@ -195,22 +195,28 @@ struct ksm_stable_node {
>>> * @node: rb node of this rmap_item in the unstable tree
>>> * @head: pointer to stable_node heading this list in the stable tree
>>> * @hlist: link into hlist of rmap_items hanging off that stable_node
>>> - * @age: number of scan iterations since creation
>>> - * @remaining_skips: how many scans to skip
>>> + * @age: number of scan iterations since creation (unstable node)
>>> + * @remaining_skips: how many scans to skip (unstable node)
>>> + * @vm_pgoff: vm_pgoff into the original VMA where the page is mapped (stable node)
>>> */
>>> struct ksm_rmap_item {
>>> struct ksm_rmap_item *rmap_list;
>>> union {
>>> - struct anon_vma *anon_vma; /* when stable */
>>> + struct anon_vma *anon_vma; /* for reverse mapping, when stable */
>>> #ifdef CONFIG_NUMA
>>> int nid; /* when node of unstable tree */
>>> #endif
>>> };
>>> struct mm_struct *mm;
>>> unsigned long address; /* + low bits used for flags below */
>>> - unsigned int oldchecksum; /* when unstable */
>>> - rmap_age_t age;
>>> - rmap_age_t remaining_skips;
>>> + union {
>>> + struct {
>>> + unsigned int oldchecksum;
>>> + rmap_age_t age;
>>> + rmap_age_t remaining_skips;
>>> + }; /* when unstable */
>>> + unsigned long vm_pgoff; /* for reverse mapping, when stable */
>>> + };
>>> union {
>>> struct rb_node node; /* when node of unstable tree */
>>> struct { /* when listed from stable tree */
>>> @@ -776,6 +782,10 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
>>> return vma;
>>> }
>>>
>>> +/*
>>> + * break_cow: actively break the write-protect of the VMA. This is calld when
>>
>> s/called/called/
>
> Thanks.
>
>>
>> But why are we changing the documentation as part of this patch?
>
> I want to make it more clear and readable bacause from its function name we can't tell that
> implicit assumption: "it's only called when rmap_item has not yet become stable, but page has
> been merged."
>
> Based on that assumption, the following confusion was clarified.
>
>>
>>> + * rmap_item has not yet become stable, but page has been merged.
>>> + */
>>> static void break_cow(struct ksm_rmap_item *rmap_item)
>>> {
>>> struct mm_struct *mm = rmap_item->mm;
>>> @@ -787,6 +797,8 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
>>> * to undo, we also need to drop a reference to the anon_vma.
>>> */
>>> put_anon_vma(rmap_item->anon_vma);
>>> + /* Reset pgoff that overlays age-related information. (still unstable) */
>>> + rmap_item->vm_pgoff = 0;
>>>
>>> mmap_read_lock(mm);
>>> vma = find_mergeable_vma(mm, addr);
>>> @@ -899,6 +911,8 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
>>> VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
>>> stable_node->rmap_hlist_len--;
>>> put_anon_vma(rmap_item->anon_vma);
>>> + /* Reset pgoff that overlays age-related information. */
>>
>> AI review says that this might not be the case on 32bit. So I guess the comment
>> should be
>>
>> "might overlay"
>
> ok, thanks.
>
>>
>>> + rmap_item->vm_pgoff = 0;
>>> rmap_item->address &= PAGE_MASK;
>>> cond_resched();
>>> }
>>> @@ -1052,6 +1066,8 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
>>> stable_node->rmap_hlist_len--;
>>>
>>> put_anon_vma(rmap_item->anon_vma);
>>> + /* Reset pgoff that overlays age-related information. */
>>
>> Same here.
>>
>>> + rmap_item->vm_pgoff = 0;
>>> rmap_item->head = NULL;
>>> rmap_item->address &= PAGE_MASK;
>>>
>>> @@ -1598,8 +1614,15 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
>>> /* Unstable nid is in union with stable anon_vma: remove first */
>>> remove_rmap_item_from_tree(rmap_item);
>>>
>>> - /* Must get reference to anon_vma while still holding mmap_lock */
>>> + /*
>>> + * Must get reference to anon_vma while still holding mmap_lock,
>>> + * We set these two members of stable node here instead of
>>> + * stable_tree_append(), maybe because we don't want to hold
>>> + * mmap_read_lock again? Here mmap_read_lock is already held to
>>> + * find_mergeable_vma before merging.
>>> + */
>>> rmap_item->anon_vma = vma->anon_vma;
>>> + rmap_item->vm_pgoff = vma->vm_pgoff;
>>
>> I suggested to use the actual linear page index instead at [1]. Storing the
>> vm_pgoff is wrong I think.
>
> Yes, you are right, I have comment the reason in last email. will correct it as you said.
>
>>
>> [1] https://lore.kernel.org/all/5401c1d2-5f42-4288-9dad-2b9768b579c7@xxxxxxxxxx/
>>
>>
>> In another comment I said:
>>
>> But it's all confusing. Because we might temporarily have rmap_item->anon_vma
>> set on an rmap_entry that does not yet have the STABLE_FLAG flag set through
>> stable_tree_append().
>>
>> And then we magically call break_cow() which does a magical
>>
>> put_anon_vma(rmap_item->anon_vma);
>>
>> (this doesn't look correct in once case) ... anyhow.
>>
>> So we might want to reset the pgoff there as well, OR only store
>> the pgoff in stable_tree_append() where we actually set STABLE_FLAG.
>>
>>
>> So I guess we have to figure that out as well.
>>
>
> I think it should be correct in general, but that code was indeed a bit confusing.
> So I've added some comments to make it as clear as possible.
>
> The underlying idea should be this: when a page is merged (try_to_merge_with_ksm_page)
> and becomes a KSM page, we should promptly store rmap_item->anon_vma
> (so that rmap_walk_ksm can succeed). However, there are various subsequent reasons why
> the page might fail to acquire the STABLE_FLAG—for example, a second attempt in
> try_to_merge_with_ksm_page fails (see the implementation of try_to_merge_two_pages), or
> stable_tree_insert fails (in cmp_and_merge_page). In such cases, we need to revert the
> page back to an anonymous page, so we must explicitly call break_cow() and
> put_anon_vma(rmap_item->anon_vma).
>
> So generally, I keep the same way to store the pgoff as anon_vma.
Please document that carefully in the patch description :)
--
Cheers,
David