Re: [PATCH] sched/numa, mm: Skip page promotion if cpu pid is valid

From: Donet Tom

Date: Tue Mar 31 2026 - 06:05:53 EST

On 3/31/26 2:47 PM, Huang, Ying wrote:

Donet Tom <donettom@xxxxxxxxxxxxx> writes:

Hi

On 3/31/26 2:03 PM, Huang, Ying wrote:

Hi, Donet,

Donet Tom <donettom@xxxxxxxxxxxxx> writes:

On 3/26/26 3:59 PM, David Hildenbrand (Arm) wrote:

On 3/26/26 08:12, Donet Tom wrote:

If memory tiering is disabled, cpupid of slow memory pages may
contain a valid CPU and PID. If tiering is enabled at runtime,
there is a chance that in should_numa_migrate_memory(), this
valid CPU/PID is treated as a last access timestamp, leading
to unnecessary promotion.

Is that measurable? Should we at least have a Fixes: ?

Prevent this by skipping promotion when cpupid is valid.

Signed-off-by: Donet Tom <donettom@xxxxxxxxxxxxx>
---
kernel/sched/fair.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4b43809a3fb1..f5830a5a94d5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2001,6 +2001,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
unsigned int latency, th, def_th;
long nr = folio_nr_pages(folio);

/*
* When ...

+ /* When tiering is enabled at runtime, last_cpupid may
+ * hold a valid cpupid instead of an access timestamp.
+ * If so, skip page promotion.
+ */
+ if (cpupid_valid(folio_last_cpupid(folio)))
+ return false;
+

IIUC, as timestamp we use jiffies_to_msecs(). So, soon after bootup,
we would no longer get false positives for cpupid_valid().
I suppose overflows are not a problem, correct?

Thank you, David, for guiding me in the right direction.

I initially thought that overflows would not occur, and therefore
cpupid_valid() would not produce false positives. However,
after looking into it further, it appears that overflow can
happen when storing the access time.

The last_cpupid field is used to store the last access time.
From the code, it appears that 21 bits are used for this
(#define LAST_CPUPID_SHIFT (LAST__PID_SHIFT + LAST__CPU_SHIFT)).

With 21 bits, the maximum value that can be stored is

It can be less than 21 bits, if CONFIG_NR_CPUS is small.

DEFINE(NR_CPUS_BITS, order_base_2(CONFIG_NR_CPUS));

2097151ms (35Hrs) . If the access time exceeds this
range, it can overflow, which may lead to cpupid_valid()
returning false positives.

I think we need a reliable way to determine cpupid_valid() that
does not produce false positives.

Yes. IMHO, false positives is unavoidable. So, the patch fixes a
temporal performance issue at the cost of a longstanding performance
issue. Right?

I was trying to fix a functional issue. When memory tiering is

enabled at runtime, treating last_cpupid as access time is incorrect, right?

I don't think that it's a functional issue. It has only performance
impact. Did you find any functionality bug?

Thank you for the confirmation. I thought this was a functional issue. In that case, we can drop this patch.

-Donet

---
Best Regards,
Huang, Ying