Re: [RFC PATCH 00/14] Virtual Swap Space

From: Usama Arif
Date: Tue Apr 08 2025 - 09:21:10 EST




On 08/04/2025 00:42, Nhat Pham wrote:
>
> V. Benchmarking
>
> As a proof of concept, I run the prototype through some simple
> benchmarks:
>
> 1. usemem: 16 threads, 2G each, memory.max = 16G
>
> I benchmarked the following usemem commands:
>
> time usemem --init-time -w -O -s 10 -n 16 2g
>
> Baseline:
> real: 33.96s
> user: 25.31s
> sys: 341.09s
> average throughput: 111295.45 KB/s
> average free time: 2079258.68 usecs
>
> New Design:
> real: 35.87s
> user: 25.15s
> sys: 373.01s
> average throughput: 106965.46 KB/s
> average free time: 3192465.62 usecs
>
> To root cause this regression, I ran perf on the usemem program, as
> well as on the following stress-ng program:
>
> perf record -ag -e cycles -G perf_cg -- ./stress-ng/stress-ng --pageswap $(nproc) --pageswap-ops 100000
>
> and observed the (predicted) increase in lock contention on swap cache
> accesses. This regression is alleviated if I put together the
> following hack: limit the virtual swap space to a sufficient size for
> the benchmark, range partition the swap-related data structures (swap
> cache, zswap tree, etc.) based on the limit, and distribute the
> allocation of virtual swap slotss among these partitions (on a per-CPU
> basis):
>
> real: 34.94s
> user: 25.28s
> sys: 360.25s
> average throughput: 108181.15 KB/s
> average free time: 2680890.24 usecs
>
> As mentioned above, I will implement proper dynamic swap range
> partitioning in a follow up work.
>
> 2. Kernel building: zswap enabled, 52 workers (one per processor),
> memory.max = 3G.
>
> Baseline:
> real: 183.55s
> user: 5119.01s
> sys: 655.16s
>
> New Design:
> real: mean: 184.5s
> user: mean: 5117.4s
> sys: mean: 695.23s
>
> New Design (Static Partition)
> real: 183.95s
> user: 5119.29s
> sys: 664.24s
>

Hi Nhat,

Thanks for the patches! I have glanced over a couple of them, but this was the main question that came to my mind.

Just wanted to check if you had a look at the memory regression during these benchmarks?

Also what is sizeof(swp_desc)? Maybe we can calculate the memory overhead as sizeof(swp_desc) * swap size/PAGE_SIZE?

For a 64G swap that is filled with private anon pages, the overhead in MB might be (sizeof(swp_desc) in bytes * 16M) - 16M (zerobitmap) - 16M*8 (swap map)?

This looks like a sizeable memory regression?

Thanks,
Usama