Re: [PATCH v5 0/2] sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems
From: K Prateek Nayak
Date: Thu Apr 24 2025 - 00:42:59 EST
Hello Libo,
On 4/24/2025 8:15 AM, Libo Chen wrote:
v1->v2:
1. add perf improvment numbers in commit log. Yet to find perf diff on
will-it-scale, so not included here. Plan to run more workloads.
2. add tracepoint.
3. To peterz's comment, this will make it impossible to attract tasks to
those memory just like other VMA skippings. This is the current
implementation, I think we can improve that in the future, but at the
moment it's probabaly better to keep it consistent.
I tested the series with hackbench running on a dual socket system with
memory pinned to one node and I could see the skip_cpuset_numa traces
being logged:
sched-messaging-9430 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9430 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9640 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9640 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9645 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9645 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9637 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9637 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9629 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9629 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9639 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9639 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9630 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9630 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9487 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9487 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9635 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9635 tgid=9007 ngid=0 mem_nodes_allowed=0
sched-messaging-9647 ...: sched_skip_cpuset_numa: comm=sched-messaging pid=9647 tgid=9007 ngid=0 mem_nodes_allowed=0
...
Feel free to add:
Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
--
Thanks and Regards,
Prateek
v2->v3:
1. add enable_cpuset() based on Mel's suggestion but again I think it's
redundant.
2. print out nodemask with %*p.. format in the tracepoint.
v3->v4:
1. fix an unsafe dereference of a pointer to content not on ring buffer,
namely mem_allowed_ptr in the tracepoint.
v4->v5:
1. add BUILD_BUG_ON() in TP_fast_assign() to guard against future
changes (particularly in size) in nodemask_t.
Libo Chen (2):
sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
cpuset.mems
sched/numa: Add tracepoint that tracks the skipping of numa balancing
due to cpuset memory pinning
include/trace/events/sched.h | 33 +++++++++++++++++++++++++++++++++
kernel/sched/fair.c | 9 +++++++++
2 files changed, 42 insertions(+)