Re: [LSF/MM/BPF TOPIC] [RFC PATCH 0/4] mm/mempolicy: introduce socket-aware weighted interleave
From: Dave Jiang
Date: Thu Mar 26 2026 - 17:45:04 EST
On 3/26/26 1:54 AM, Rakie Kim wrote:
> On Wed, 25 Mar 2026 12:33:50 +0000 Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> wrote:
>> On Tue, 24 Mar 2026 14:35:45 +0900
>> Rakie Kim <rakie.kim@xxxxxx> wrote:
>>
>>> On Fri, 20 Mar 2026 16:56:05 +0000 Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> wrote:
<--snip-->
> Hello Jonathan,
>
> Thank you for the deep insight into the HMAT parser code. As you
> mentioned, considering the current state where node 1 is still
> registered as the initiator in sysfs despite the flag being 0, it
> seems highly likely that the kernel parser logic is not handling
> this specific situation gracefully.
>
>>
>>> Because both HMAT and sysfs are exposing abnormal values, it was
>>> impossible for me to determine the true socket connections for CXL
>>> using this data.
>>>
>>>>>
>>>>> Even though the distance map shows node2 is physically closer to
>>>>> Socket 0 and node3 to Socket 1, the HMAT incorrectly defines the
>>>>> routing path strictly through Socket 1. Because the HMAT alone made it
>>>>> difficult to determine the exact physical socket connections on these
>>>>> systems, I ended up using the current CXL driver-based approach.
>>>>
>>>> Are the HMAT latencies and bandwidths all there? Or are some missing
>>>> and you have to use SLIT (which generally is garbage for historical
>>>> reasons of tuning SLIT to particular OS behaviour).
>>>>
>>>
>>> The HMAT latencies and bandwidths are present, but the values seem
>>> broken. Here is the latency table:
>>>
>>> Init->Target | node0 | node1 | node2 | node3
>>> node0 | 0x38B | 0x89F | 0x9C4 | 0x3AFC
>>> node1 | 0x89F | 0x38B | 0x3AFC| 0x4268
>>
>> Yeah. That would do it... Looks like that final value is garbage.
Hi Rakie,
So I talked to the Intel BIOS folks and apparently for devices that are not hot-plugged (with memory ranges provided in SRAT), those HMAT values are the value for end to end and not just CPU to Gen Port. That's why they look so much bigger. So there are couple things we'll have to consider:
1. Make sure that Intel, AMD, and ARM HMATs are all created the same way and this is the agreed on way to do this. Hopefully someone from AMD and ARM vendors can comment. We all should get on the same page for the CXL kernel code to work properly.
2. Add code in the CXL driver to detect whether the range is in SRAT and then skip the end to end perf calculation if that is the case.
DJ
<--snip-->