Re: [PATCH v2 0/3] x86/fred: enable FRED by default

From: Jens Axboe

Date: Thu Mar 26 2026 - 18:46:11 EST

On 3/26/26 4:11 PM, Andy Lutomirski wrote:
>
>
> On Wed, Mar 25, 2026, at 4:01 PM, H. Peter Anvin wrote:
>> From: H. Peter Anvin (Intel) <hpa@xxxxxxxxx>
>>
>> When FRED was added to the mainline kernel, it was set up as an
>> explicit opt-in due to the risk of regressions before hardware was
>> available publicly.
>>
>> Now, Panther Lake (Core Ultra 300 series) has been released, and
>> benchmarking by Phoronix has shown that it provides a significant
>> performance benefit on most workloads:
>>
>> https://www.phoronix.com/review/intel-fred-panther-lake
>
> Those performance increases while with fio / io_uring are ridiculous.
> I wonder whether anyone can looked at /proc/interrupts to try to
> figure out what's going on. I imagine we're bottlenecking on IO
> interrupts or something and FRED speeds that up by a factor of
> several, but this makes me wonder whether there's some tuning that
> should be done.

I did see those results and was wondering myself what on earth is going
on there. The IOPS rate isn't anywhere near high enough to saturate a
single core (I saw 3-500K IOPS). I probably would've configured some
polled nvme queues and ran polled IO to compare. I'm almost pondering if
it's some power saving or whatever going on, and timing is enough to
mess with it and change the outcome. Because outside of that, I'm a bit
puzzled.

I don't have a panther lake or any system that supports FRED, so all I
can do is guess from here.

> Jens, for background, FRED ought to speed up page faults and
> interrupts (both IPI and device) by a considerable amount -- maybe a
> reduction of 20k or more cycles per interrupt. But Phoronix is
> showing up to a factor of 2 (!) actual performance increase using
> io_uring with a job count of 8. Now maybe 8 is too small, but I'm
> wondering whether the io_uring or IO part is missing some optimization
> that it ought to have.

He's using fio, which should do the right thing. I'm assuming job count
means "8 threads doing IO", but I also don't know what the setting of
each job is. Presumably each job drives a queue depth of <something> as
well? With perhaps batching on the submission and completion side. Or
maybe 8 jobs is a single IO thread, driving QD=8?

Sorry, not much I can help with here, without having access to the
system and knowing exactly what was run. And I'd also want to poke at it
while it's running to get an idea of what is going on. An O_DIRECT 4k
random write going from 280K to 500K IOPS is quite the change, to say
the least, particularly when there's no way you're CPU bound at 280K to
begin with.

--
Jens Axboe