Re: [PATCH v5 2/2] sched/numa: add statistics of numa balance task

From: Shakeel Butt
Date: Tue May 27 2025 - 13:48:54 EST


On Sun, May 25, 2025 at 08:35:24PM +0800, Chen, Yu C wrote:
> On 5/25/2025 1:32 AM, Shakeel Butt wrote:
[...]
> > can you please give an end-to-end> flow/story of all these events
> happening on a timeline.
> >
>
> Yes, sure, let me have a try.
>
> The goal of NUMA balancing is to co-locate a task and its
> memory pages on the same NUMA node. There are two strategies:
> migrate the pages to the task's node, or migrate the task to
> the node where its pages reside.
>
> Suppose a task p1 is running on Node 0, but its pages are
> located on Node 1. NUMA page fault statistics for p1 reveal
> its "page footprint" across nodes. If NUMA balancing detects
> that most of p1's pages are on Node 1:
>
> 1.Page Migration Attempt:
> The Numa balance first tries to migrate p1's pages to Node 0.
> The numa_page_migrate counter increments.
>
> 2.Task Migration Strategies:
> After the page migration finishes, Numa balance checks every
> 1 second to see if p1 can be migrated to Node 1.
>
> Case 2.1: Idle CPU Available
> If Node 1 has an idle CPU, p1 is directly scheduled there. This event is
> logged as numa_task_migrated.
> Case 2.2: No Idle CPU (Task Swap)
> If all CPUs on Node1 are busy, direct migration could cause CPU contention
> or load imbalance. Instead:
> The Numa balance selects a candidate task p2 on Node 1 that prefers
> Node 0 (e.g., due to its own page footprint).
> p1 and p2 are swapped. This cross-node swap is recorded as
> numa_task_swapped.
>

Thanks for the explanation, this is really helpful and I would like this
to be included in the commit message.

> > Beside that, do you think there might be some other scheduling events
> > (maybe unrelated to numa balancing) which might be suitable for
> > memory.stat? Basically I am trying to find if having sched events in
> > memory.stat be an exception for numa balancing or more general.
>
> If the criterion is a combination of task scheduling strategy and
> page-based operations, I cannot find any other existing scheduling
> events. For now, NUMA balancing seems to be the only case.

Mainly I was looking if in future we need to add more sched events to
memory.stat file.

Let me reply on the other email chain on what should we do next.