Re: [mm/contpte v3 1/1] mm/contpte: Optimize loop to reduce redundant operations

From: Xavier
Date: Wed Apr 16 2025 - 12:16:49 EST





At 2025-04-16 16:57:06, "David Laight" <david.laight.linux@xxxxxxxxx> wrote:
>On Tue, 15 Apr 2025 16:22:05 +0800
>Xavier <xavier_qy@xxxxxxx> wrote:
>
>> This commit optimizes the contpte_ptep_get function by adding early
>> termination logic. It checks if the dirty and young bits of orig_pte
>> are already set and skips redundant bit-setting operations during
>> the loop. This reduces unnecessary iterations and improves performance.
>
>Benchmarks?
>
>As has been pointed out before CONT_PTES is small and IIRC dirty+young
>is unusual.

I haven't found some suitable benchmark tests yet. I will write some more
general test scenarios. Please pay attention to the subsequent emails.

>
>>
>> Signed-off-by: Xavier <xavier_qy@xxxxxxx>
>> ---
>> arch/arm64/mm/contpte.c | 20 ++++++++++++++++++--
>> 1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
>> index bcac4f55f9c1..0acfee604947 100644
>> --- a/arch/arm64/mm/contpte.c
>> +++ b/arch/arm64/mm/contpte.c
>> @@ -152,6 +152,16 @@ void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr,
>> }
>> EXPORT_SYMBOL_GPL(__contpte_try_unfold);
>>
>> +/* Note: in order to improve efficiency, using this macro will modify the
>> + * passed-in parameters.*/
>
>... this macro modifies ...
>
>But you can make it obvious my passing by reference.
>The compiler will generate the same code.
>

This part may also be further refined.

--
Thanks,
Xavier