[PATCH v3 0/2] blk-mq: introduce tag starvation observability

From: Aaron Tomlin

Date: Thu Mar 19 2026 - 18:20:21 EST

Hi Jens, Steve, Masami,

In high-performance storage environments, particularly when utilising RAID
controllers with shared tag sets (BLK_MQ_F_TAG_HCTX_SHARED), severe latency
spikes can occur when fast devices are starved of available tags.
Currently, diagnosing this specific queue contention requires deploying
dynamic kprobes or inferring sleep states, which lacks a simple,
out-of-the-box diagnostic path.

This short series introduces dedicated, low-overhead observability for tag
exhaustion events in the block layer:

- Patch 1 introduces the "block_rq_tag_wait" tracepoint in the tag
allocation slow-path to capture precise, event-based starvation.

- Patch 2 complements this by exposing "wait_on_hw_tag" and
"wait_on_sched_tag" atomic counters via debugfs for quick,
point-in-time cumulative polling.

Together, these provide storage engineers with zero-configuration
mechanisms to definitively identify shared-tag bottlenecks.

Please let me know your thoughts.

Changes since v2 [1]:
- Added "Reviewed-by:" and "Tested-by:" tags for patch 1
- Evaluate is_sched_tag directly within TP_fast_assign (Steven Rostedt)
- Introduced atomic counters via debugfs

Changes since v1 [2]:
- Improved the description of the trace point (Damien Le Moal)
- Removed the redundant "active requests" (Laurence Oberman)
- Introduced pool-specific starvation tracking

[1]: https://lore.kernel.org/lkml/20260319015300.287653-1-atomlin@xxxxxxxxxxx/
[2]: https://lore.kernel.org/lkml/20260317182835.258183-1-atomlin@xxxxxxxxxxx/

Aaron Tomlin (2):
blk-mq: add tracepoint block_rq_tag_wait
blk-mq: expose tag starvation counts via debugfs

block/blk-mq-debugfs.c | 56 ++++++++++++++++++++++++++++++++++++
block/blk-mq-debugfs.h | 7 +++++
block/blk-mq-tag.c | 8 ++++++
include/linux/blk-mq.h | 10 +++++++
include/trace/events/block.h | 43 +++++++++++++++++++++++++++
5 files changed, 124 insertions(+)

--
2.51.0