Re: [PATCH 2/2] sched/fair: get this cpu once in find_new_ilb

From: Peter Zijlstra

Date: Thu Mar 19 2026 - 09:40:30 EST


On Thu, Mar 19, 2026 at 06:33:19PM +0530, Shrikanth Hegde wrote:
> Hi Peter.
>
> On 3/19/26 2:50 PM, Peter Zijlstra wrote:
> > On Thu, Mar 19, 2026 at 12:23:14PM +0530, Shrikanth Hegde wrote:
> > > smp_processor_id() is pointless to fetch in the loop. Move it out.
> > > No functional change.
> >
> > Isn't this what compilers are for?
>
> This is what i see in pseries.
>
> c00000000028cc7c: 1d 70 80 48 bl c000000000a93c98 <_find_next_and_bit>
> c00000000028cc80: 00 00 00 60 nop
> c00000000028cc84: 00 00 bd 80 lwz r5,0(r29)
> c00000000028cc88: b4 07 7e 7c extsw r30,r3
> c00000000028cc8c: 78 1b 7f 7c mr r31,r3 << This is ilb_cpu
> c00000000028cc90: 78 1b 7a 7c mr r26,r3
> c00000000028cc94: 40 18 05 7c cmplw r5,r3
> c00000000028cc98: 78 f3 c3 7f mr r3,r30
> c00000000028cc9c: 5c 00 81 40 ble c00000000028ccf8 <kick_ilb+0x10c>
> c00000000028cca0: 08 00 2d a1 lhz r9,8(r13) << cpu comes from paca_index
> c00000000028cca4: 00 f8 09 7c cmpw r9,r31
> c00000000028cca8: 18 00 82 41 beq c00000000028ccc0 <kick_ilb+0xd4>
> c00000000028ccac: 8d 0c 04 48 bl c0000000002cd938 <idle_cpu+0x8>
> c00000000028ccb0: 00 00 00 60 nop
> c00000000028ccb4: 00 00 03 2c cmpwi r3,0
> c00000000028ccb8: 78 00 82 40 bne c00000000028cd30 <kick_ilb+0x144>
> c00000000028ccbc: 00 00 bd 80 lwz r5,0(r29)
> c00000000028ccc0: 01 00 df 38 addi r6,r31,1
> c00000000028ccc4: 20 00 a5 78 clrldi r5,r5,32
> c00000000028ccc8: 78 e3 84 7f mr r4,r28
> c00000000028cccc: 78 db 63 7f mr r3,r27
> c00000000028ccd0: b4 07 c6 7c extsw r6,r6
> c00000000028ccd4: c5 6f 80 48 bl c000000000a93c98 <_find_next_and_bit>
>
>
>
> So, yes it is done inside the loop. But accessing it should be cheap as it is r13 access
> which is the paca pointer.
>
> Since different archs implement it in their own ways
> (many would have it cheap, but for some may be costly)
> and there is CONFIG_DEBUG_PREEMPT too.
> So it would be good to hoist it out of the loop IMHO.

Ah, right. And I suppose it isn't generally sound to hoist
smp_processor_id() unless you have disabled preemption, and since we
can't tell the compiler this, it just never can.

Fair enough.