Re: [PATCH V4 3/4] perf/x86/intel/uncore: Fix die ID init and look up bugs
From: Chen, Zide
Date: Mon Mar 16 2026 - 12:59:11 EST
On 3/15/2026 11:53 PM, Mi, Dapeng wrote:
>
> On 3/14/2026 1:40 AM, Zide Chen wrote:
>> In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path,
>> uncore_device_to_die() may return -1 when all CPUs associated
>> with the UBOX device are offline.
>>
>> Remove the WARN_ON_ONCE(die_id == -1) check for two reasons:
>>
>> - The current code breaks out of the loop. This is incorrect because
>> pci_get_device() does not guarantee iteration in domain or bus order,
>> so additional UBOX devices may be skipped during the scan.
>>
>> - Returning -EINVAL is incorrect, since marking offline buses with
>> die_id == -1 is expected and should not be treated as an error.
>>
>> Separately, when NUMA is disabled on a NUMA-capable platform,
>> pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die()
>> to return -1 for all PCI devices. As a result,
>> spr_update_device_location(), used on Intel SPR and EMR, ignores the
>> corresponding PMON units and does not add them to the RB tree.
>>
>> Fix this by using uncore_pcibus_to_dieid(), which retrieves topology
>> from the UBOX GIDNIDMAP register and works regardless of whether NUMA
>> is enabled in Linux. This requires snbep_pci2phy_map_init() to be
>> added in spr_uncore_pci_init().
>>
>> Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where
>> NUMA is expected to be enabled.
>>
>> Fixes: 9a7832ce3d92 ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info")
>> Fixes: 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR")
>> Tested-by: Steve Wahl <steve.wahl@xxxxxxx>
>> Signed-off-by: Zide Chen <zide.chen@xxxxxxxxx>
>> ---
>> V2:
>> - Fix the commit message to note that spr_update_device_location() is
>> used by EMR, not GNR.
>> - Rewrite the commit message for clarity.
>> - Add a Tested-by tag.
>>
>> V4: no changes.
>> ---
>> arch/x86/events/intel/uncore.c | 1 +
>> arch/x86/events/intel/uncore_snbep.c | 13 ++++++-------
>> 2 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>> index 786bd51a0d89..e9cc1ba921c5 100644
>> --- a/arch/x86/events/intel/uncore.c
>> +++ b/arch/x86/events/intel/uncore.c
>> @@ -67,6 +67,7 @@ int uncore_die_to_segment(int die)
>> return bus ? pci_domain_nr(bus) : -EINVAL;
>> }
>>
>> +/* Note: This API can only be used when NUMA information is available. */
>> int uncore_device_to_die(struct pci_dev *dev)
>> {
>> int node = pcibus_to_node(dev->bus);
>> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
>> index 9b51883fd6fd..421378f681d7 100644
>> --- a/arch/x86/events/intel/uncore_snbep.c
>> +++ b/arch/x86/events/intel/uncore_snbep.c
>> @@ -1459,13 +1459,7 @@ static int snbep_pci2phy_map_init(int devid, int nodeid_loc, int idmap_loc, bool
>> }
>>
>> map->pbus_to_dieid[bus] = die_id = uncore_device_to_die(ubox_dev);
>
> The "die_id" variable is not needed any more after removing below check,
> please remove it.
Good catch, thanks.
> Others look good to me. Thanks.
>
>
>> -
>> raw_spin_unlock(&pci2phy_map_lock);
>> -
>> - if (WARN_ON_ONCE(die_id == -1)) {
>> - err = -EINVAL;
>> - break;
>> - }
>> }
>> }
>>
>> @@ -6420,7 +6414,7 @@ static void spr_update_device_location(int type_id)
>>
>> while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) {
>>
>> - die = uncore_device_to_die(dev);
>> + die = uncore_pcibus_to_dieid(dev->bus);
>> if (die < 0)
>> continue;
>>
>> @@ -6444,6 +6438,11 @@ static void spr_update_device_location(int type_id)
>>
>> int spr_uncore_pci_init(void)
>> {
>> + int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true);
>> +
>> + if (ret)
>> + return ret;
>> +
>> /*
>> * The discovery table of UPI on some SPR variant is broken,
>> * which impacts the detection of both UPI and M3UPI uncore PMON.