Re: [patch V6 00/16] Improve /proc/interrupts further

From: Shrikanth Hegde

Date: Tue May 19 2026 - 17:19:21 EST


Hi Thomas.

On 5/18/26 1:31 AM, Thomas Gleixner wrote:
This is a follow up to v5 which can be found here:

https://lore.kernel.org/20260401195625.213446764@xxxxxxxxxx

The v1 cover letter contains a full analysis, explanation and numbers:

https://lore.kernel.org/20260303150539.513068586@xxxxxxxxxx

TLDR:

- The performance of reading of /proc/interrupts has been improved
piecewise over the years, but most of the low hanging fruit has been
left on the table.


Ran this on powerVM box with 240 CPUs.

Ran perf stat -r 1000 cat /proc/interrupts > tmp.txt
and Observed minimal improvement with series.

Base:
Performance counter stats for 'cat /proc/interrupts' (1000 runs):

0.32 msec task-clock:HG # 0.617 CPUs utilized ( +- 0.17% )
0 context-switches:HG # 0.000 /sec
0 cpu-migrations:HG # 0.000 /sec
44 page-faults:HG # 136.122 K/sec ( +- 0.03% )
1,313,263 cycles:HG # 4.063 GHz ( +- 0.17% )
2,172,511 instructions:HG # 1.65 insn per cycle ( +- 0.05% )
371,171 branches:HG # 1.148 G/sec ( +- 0.05% )
4,918 branch-misses:HG # 1.32% of all branches ( +- 0.35% )

0.000523661 +- 0.000000914 seconds time elapsed ( +- 0.17% )

v6 series:

Performance counter stats for 'cat /proc/interrupts' (1000 runs):

0.30 msec task-clock:HG # 0.591 CPUs utilized ( +- 0.25% )
0 context-switches:HG # 0.000 /sec
0 cpu-migrations:HG # 0.000 /sec
44 page-faults:HG # 145.802 K/sec ( +- 0.03% )
1,224,666 cycles:HG # 4.058 GHz ( +- 0.25% )
1,667,435 instructions:HG # 1.36 insn per cycle ( +- 0.08% )
277,534 branches:HG # 919.660 M/sec ( +- 0.09% )
5,066 branch-misses:HG # 1.83% of all branches ( +- 0.45% )

0.00051099 +- 0.00000110 seconds time elapsed ( +- 0.21% ) << 3-4% improvement

Looking at powerpc arch_show_interrupts,
It could use the similar set of optimizations.
- move to array based
- use irq_proc_emit_counts
- some interrupts such as machine check, is hardly set. set skip_vector.


Copilot suggested below diff to quickly try irq_proc_emit_counts integration.
It showed little gains compared to v6. So it maybe worth fixing that in the
right way. (similar to x86 stuff you have done)

Performance counter stats for 'cat /proc/interrupts' (1000 runs):

0.29 msec task-clock:HG # 0.586 CPUs utilized ( +- 0.22% )
0 context-switches:HG # 0.000 /sec
0 cpu-migrations:HG # 0.000 /sec
44 page-faults:HG # 153.067 K/sec ( +- 0.03% )
1,166,567 cycles:HG # 4.058 GHz ( +- 0.22% )
1,475,365 instructions:HG # 1.26 insn per cycle ( +- 0.09% )
249,051 branches:HG # 866.397 M/sec ( +- 0.10% )
5,104 branch-misses:HG # 2.05% of all branches ( +- 0.33% )

0.000490211 +- 0.000000992 seconds time elapsed ( +- 0.20% ) <<< 3-4% improvements.


diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index a0e8b998c9b5..19c9f28c39d3 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -83,6 +83,18 @@ u32 tau_interrupts(unsigned long cpu);
#endif
#endif /* CONFIG_PPC32 */
+
+/*
+ * Return a percpu pointer to a given unsigned int member of irq_stat.
+ */
+static __always_inline unsigned int __percpu *ppc_irq_stat_member(size_t off)
+{
+ return (unsigned int __percpu *)((char __percpu *)&irq_stat + off);
+}
+
+#define PPC_IRQ_STAT_PCPU(member) \
+ ppc_irq_stat_member(offsetof(irq_cpustat_t, member))
+
int arch_show_interrupts(struct seq_file *p, int prec)
{
int j;
@@ -97,33 +109,27 @@ int arch_show_interrupts(struct seq_file *p, int prec)
#endif /* CONFIG_PPC32 && CONFIG_TAU_INT */
seq_printf(p, "%*s:", prec, "LOC");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).timer_irqs_event, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(timer_irqs_event));
seq_printf(p, " Local timer interrupts for timer event device\n");
seq_printf(p, "%*s:", prec, "BCT");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).broadcast_irqs_event, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(broadcast_irqs_event));
seq_printf(p, " Broadcast timer interrupts for timer event device\n");
seq_printf(p, "%*s:", prec, "LOC");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).timer_irqs_others, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(timer_irqs_others));
seq_printf(p, " Local timer interrupts for others\n");
seq_printf(p, "%*s:", prec, "SPU");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).spurious_irqs, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(spurious_irqs));
seq_printf(p, " Spurious interrupts\n");
seq_printf(p, "%*s:", prec, "PMI");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).pmu_irqs, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(pmu_irqs));
seq_printf(p, " Performance monitoring interrupts\n");
seq_printf(p, "%*s:", prec, "MCE");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).mce_exceptions, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(mce_exceptions));
seq_printf(p, " Machine check exceptions\n");
#ifdef CONFIG_PPC_BOOK3S_64
@@ -136,22 +142,19 @@ int arch_show_interrupts(struct seq_file *p, int prec)
#endif
seq_printf(p, "%*s:", prec, "NMI");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).sreset_irqs, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(sreset_irqs));
seq_printf(p, " System Reset interrupts\n");
#ifdef CONFIG_PPC_WATCHDOG
seq_printf(p, "%*s:", prec, "WDG");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).soft_nmi_irqs, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(soft_nmi_irqs));
seq_printf(p, " Watchdog soft-NMI interrupts\n");
#endif
#ifdef CONFIG_PPC_DOORBELL
if (cpu_has_feature(CPU_FTR_DBELL)) {
seq_printf(p, "%*s:", prec, "DBL");
- for_each_online_cpu(j)
- seq_put_decimal_ull_width(p, " ", per_cpu(irq_stat, j).doorbell_irqs, 10);
+ irq_proc_emit_counts(p, PPC_IRQ_STAT_PCPU(doorbell_irqs));
seq_printf(p, " Doorbell interrupts\n");
}
#endif