Re: [PATCH v3 0/4] mm/folio_zero_user: add multi-page clearing

From: Ingo Molnar
Date: Mon Apr 14 2025 - 02:36:30 EST



* Ankur Arora <ankur.a.arora@xxxxxxxxxx> wrote:

> We also see performance improvement for cases where this optimization is
> unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because
> REP; STOS is typically microcoded which can now be amortized over
> larger regions and the hint allows the hardware prefetcher to do a
> better job.
>
> Milan (EPYC 7J13, boost=0, preempt=full|lazy):
>
> mm/folio_zero_user x86/folio_zero_user change
> (GB/s +- stddev) (GB/s +- stddev)
>
> pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2%
> pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5%
>
> Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy):
>
> mm/folio_zero_user x86/folio_zero_user change
> (GB/s +- stddev) (GB/s +- stddev)
>
> pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57%
> pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10%

How was this measured? Could you integrate this measurement as a new
tools/perf/bench/ subcommand so that people can try it on different
systems, etc.? There's already a 'perf bench mem' subcommand space
where this feature could be added to.

Thanks,

Ingo