Re: [PATCH v3 5.10/5.15 1/2] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"

From: yanjun.zhu

Date: Fri Jun 05 2026 - 19:25:42 EST


On 6/5/26 10:14 AM, Vladislav Nikolaev wrote:
From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>

commit b2b1ddc457458fecd1c6f385baa9fbda5f0c63ad upstream.

In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().

If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.

If rxe_init_task is not executed, rxe_cleanup_task will not be called.

Reported-by: syzbot+cfcc1a3c85be15a40cba@xxxxxxxxxxxxxxxxxxxxxxxxx
Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Fixes: 2d4b21e0a291 ("IB/rxe: Prevent from completer to operate on non valid QP")
Signed-off-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@xxxxxxxxx
Signed-off-by: Leon Romanovsky <leon@xxxxxxxxxx>
[ Vladislav: add the missing resp.task.func check and keep the cleanup
order used by upstream after 960ebe97e523 ("RDMA/rxe: Remove
__rxe_do_task()"). Moving rxe_cleanup_task(&qp->resp.task) after the RC
timer cleanup is independent from that commit: timer deletion does not
depend on the responder task cleanup, and placing all task cleanup after
the timers matches the final upstream ordering while keeping this fix
minimal for 5.10/5.15. ]
Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@xxxxxxxxx>

Thanks a lot. I am fine with this.

Zhu Yanjun

---
drivers/infiniband/sw/rxe/rxe_qp.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 4c938d841f76..0532c446760d 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -760,15 +760,20 @@ void rxe_qp_destroy(struct rxe_qp *qp)
{
qp->valid = 0;
qp->qp_timeout_jiffies = 0;
- rxe_cleanup_task(&qp->resp.task);
if (qp_type(qp) == IB_QPT_RC) {
del_timer_sync(&qp->retrans_timer);
del_timer_sync(&qp->rnr_nak_timer);
}
- rxe_cleanup_task(&qp->req.task);
- rxe_cleanup_task(&qp->comp.task);
+ if (qp->resp.task.func)
+ rxe_cleanup_task(&qp->resp.task);
+
+ if (qp->req.task.func)
+ rxe_cleanup_task(&qp->req.task);
+
+ if (qp->comp.task.func)
+ rxe_cleanup_task(&qp->comp.task);
/* flush out any receive wr's or pending requests */
if (qp->req.task.func)