[PATCH AUTOSEL 6.19-6.12] nvmet: move async event work off nvmet-wq

From: Sasha Levin

Date: Tue Mar 17 2026 - 07:33:48 EST


From: Chaitanya Kulkarni <kch@xxxxxxxxxx>

[ Upstream commit 2922e3507f6d5caa7f1d07f145e186fc6f317a4e ]

For target nvmet_ctrl_free() flushes ctrl->async_event_work.
If nvmet_ctrl_free() runs on nvmet-wq, the flush re-enters workqueue
completion for the same worker:-

A. Async event work queued on nvmet-wq (prior to disconnect):
nvmet_execute_async_event()
queue_work(nvmet_wq, &ctrl->async_event_work)

nvmet_add_async_event()
queue_work(nvmet_wq, &ctrl->async_event_work)

B. Full pre-work chain (RDMA CM path):
nvmet_rdma_cm_handler()
nvmet_rdma_queue_disconnect()
__nvmet_rdma_queue_disconnect()
queue_work(nvmet_wq, &queue->release_work)
process_one_work()
lock((wq_completion)nvmet-wq) <--------- 1st
nvmet_rdma_release_queue_work()

C. Recursive path (same worker):
nvmet_rdma_release_queue_work()
nvmet_rdma_free_queue()
nvmet_sq_destroy()
nvmet_ctrl_put()
nvmet_ctrl_free()
flush_work(&ctrl->async_event_work)
__flush_work()
touch_wq_lockdep_map()
lock((wq_completion)nvmet-wq) <--------- 2nd

Lockdep splat:

============================================
WARNING: possible recursive locking detected
6.19.0-rc3nvme+ #14 Tainted: G N
--------------------------------------------
kworker/u192:42/44933 is trying to acquire lock:
ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90

but task is already holding lock:
ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660

3 locks held by kworker/u192:42/44933:
#0: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660
#1: ffffc9000e6cbe28 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x660
#2: ffffffff82d4db60 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530

Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
Call Trace:
__flush_work+0x268/0x530
nvmet_ctrl_free+0x140/0x310 [nvmet]
nvmet_cq_put+0x74/0x90 [nvmet]
nvmet_rdma_free_queue+0x23/0xe0 [nvmet_rdma]
nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
process_one_work+0x206/0x660
worker_thread+0x184/0x320
kthread+0x10c/0x240
ret_from_fork+0x319/0x390

Move async event work to a dedicated nvmet-aen-wq to avoid reentrant
flush on nvmet-wq.

Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Chaitanya Kulkarni <kch@xxxxxxxxxx>
Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---

LLM Generated explanations, may be completely bogus:

## Analysis

### Problem Description
This commit fixes a **recursive workqueue locking deadlock** in the NVMe
target subsystem. When `nvmet_ctrl_free()` is called from a work item
running on `nvmet-wq` (e.g., via the RDMA CM disconnect path), it calls
`flush_work(&ctrl->async_event_work)`. Since `async_event_work` is also
queued on the same `nvmet-wq`, this causes a recursive acquisition of
the workqueue completion lock, triggering a lockdep warning and
potentially a real deadlock.

The call chain is clearly documented in the commit message:
1. RDMA CM handler queues `release_work` on `nvmet-wq`
2. `nvmet_rdma_release_queue_work()` → `nvmet_rdma_free_queue()` →
`nvmet_sq_destroy()` → `nvmet_ctrl_put()` → `nvmet_ctrl_free()`
3. `nvmet_ctrl_free()` calls `flush_work(&ctrl->async_event_work)` — but
`async_event_work` is on the same `nvmet-wq`

### Fix Description
The fix creates a new dedicated workqueue `nvmet-aen-wq` and moves the
two `queue_work()` calls for `async_event_work` from `nvmet_wq` to
`nvmet_aen_wq`. It also adds `flush_workqueue(nvmet_aen_wq)` in
`nvmet_rdma_remove_one()` alongside the existing
`flush_workqueue(nvmet_wq)`.

### Stable Kernel Criteria Assessment

1. **Fixes a real bug**: YES — This fixes a deadlock/recursive locking
issue with a concrete lockdep splat included in the commit message.
The RDMA disconnect path can trigger this in production.

2. **Obviously correct and tested**: YES — The fix is straightforward:
move work to a separate workqueue so flushing it from the original
workqueue doesn't deadlock. Reviewed by Christoph Hellwig (NVMe
subsystem expert). This follows the same pattern as prior fixes
(commit `710c69dbaccda` "nvmet-fc: avoid deadlock on delete
association path").

3. **Small and contained**: YES — Changes are minimal:
- Add a new workqueue variable declaration and initialization
- Change two `queue_work()` calls from `nvmet_wq` to `nvmet_aen_wq`
- Add one `flush_workqueue()` call in RDMA cleanup
- Proper init/cleanup in module init/exit

4. **No new features**: Correct — This only fixes a deadlock by
separating workqueues.

5. **Severity**: HIGH — Deadlocks can hang the system. NVMe target users
(storage servers, NVMe-oF deployments) would hit this during
disconnect/reconnect scenarios.

### Risk Assessment
- **Risk**: LOW — The change is purely mechanical: moving work items
from one workqueue to another. The work function itself is unchanged.
The new workqueue has the same flags minus `WQ_SYSFS`.
- **Dependencies**: The bug has existed since commit `8832cf922151`
(2022) which introduced `nvmet_wq`. This fix is self-contained and
should apply to any stable tree that has `nvmet_wq`.
- **Regression potential**: Very low — the async event work is isolated
and doesn't interact with other work items on `nvmet_wq` in any
ordering-dependent way.

### Verification
- Verified `flush_work(&ctrl->async_event_work)` exists at
`drivers/nvme/target/core.c:1746` in `nvmet_ctrl_free()`
- Verified `nvmet_wq` was introduced in commit `8832cf922151`
(2022-03-21), confirming the bug has existed for ~4 years
- Verified there are exactly two callers of `queue_work(nvmet_wq,
&ctrl->async_event_work)` — both changed by this patch
- Verified similar deadlock fixes have been merged before (commit
`710c69dbaccda` for nvmet-fc)
- Verified the commit is reviewed by Christoph Hellwig and signed off by
Keith Busch (NVMe maintainer)
- Verified the lockdep splat in the commit message matches the described
call chain
- Could not verify if this commit is already in any stable queue (not in
current 6.19.8 tree)

This is a clear, well-documented deadlock fix with low risk, small
scope, expert review, and high user impact for NVMe-oF deployments. It
meets all stable kernel criteria.

**YES**

drivers/nvme/target/admin-cmd.c | 2 +-
drivers/nvme/target/core.c | 14 ++++++++++++--
drivers/nvme/target/nvmet.h | 1 +
drivers/nvme/target/rdma.c | 1 +
4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 3da31bb1183eb..100d1466ff841 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -1586,7 +1586,7 @@ void nvmet_execute_async_event(struct nvmet_req *req)
ctrl->async_event_cmds[ctrl->nr_async_event_cmds++] = req;
mutex_unlock(&ctrl->lock);

- queue_work(nvmet_wq, &ctrl->async_event_work);
+ queue_work(nvmet_aen_wq, &ctrl->async_event_work);
}

void nvmet_execute_keep_alive(struct nvmet_req *req)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index cc88e5a28c8a9..5075f7123358a 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -26,6 +26,8 @@ static DEFINE_IDA(cntlid_ida);

struct workqueue_struct *nvmet_wq;
EXPORT_SYMBOL_GPL(nvmet_wq);
+struct workqueue_struct *nvmet_aen_wq;
+EXPORT_SYMBOL_GPL(nvmet_aen_wq);

/*
* This read/write semaphore is used to synchronize access to configuration
@@ -205,7 +207,7 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
list_add_tail(&aen->entry, &ctrl->async_events);
mutex_unlock(&ctrl->lock);

- queue_work(nvmet_wq, &ctrl->async_event_work);
+ queue_work(nvmet_aen_wq, &ctrl->async_event_work);
}

static void nvmet_add_to_changed_ns_log(struct nvmet_ctrl *ctrl, __le32 nsid)
@@ -1958,9 +1960,14 @@ static int __init nvmet_init(void)
if (!nvmet_wq)
goto out_free_buffered_work_queue;

+ nvmet_aen_wq = alloc_workqueue("nvmet-aen-wq",
+ WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+ if (!nvmet_aen_wq)
+ goto out_free_nvmet_work_queue;
+
error = nvmet_init_debugfs();
if (error)
- goto out_free_nvmet_work_queue;
+ goto out_free_nvmet_aen_work_queue;

error = nvmet_init_discovery();
if (error)
@@ -1976,6 +1983,8 @@ static int __init nvmet_init(void)
nvmet_exit_discovery();
out_exit_debugfs:
nvmet_exit_debugfs();
+out_free_nvmet_aen_work_queue:
+ destroy_workqueue(nvmet_aen_wq);
out_free_nvmet_work_queue:
destroy_workqueue(nvmet_wq);
out_free_buffered_work_queue:
@@ -1993,6 +2002,7 @@ static void __exit nvmet_exit(void)
nvmet_exit_discovery();
nvmet_exit_debugfs();
ida_destroy(&cntlid_ida);
+ destroy_workqueue(nvmet_aen_wq);
destroy_workqueue(nvmet_wq);
destroy_workqueue(buffered_io_wq);
destroy_workqueue(zbd_wq);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index b664b584fdc8e..319d6a5e9cf05 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -501,6 +501,7 @@ extern struct kmem_cache *nvmet_bvec_cache;
extern struct workqueue_struct *buffered_io_wq;
extern struct workqueue_struct *zbd_wq;
extern struct workqueue_struct *nvmet_wq;
+extern struct workqueue_struct *nvmet_aen_wq;

static inline void nvmet_set_result(struct nvmet_req *req, u32 result)
{
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 9c12b2361a6d7..0384323649671 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -2088,6 +2088,7 @@ static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data
mutex_unlock(&nvmet_rdma_queue_mutex);

flush_workqueue(nvmet_wq);
+ flush_workqueue(nvmet_aen_wq);
}

static struct ib_client nvmet_rdma_ib_client = {
--
2.51.0