[PATCH for-next 1/3] RDMA/hns: Fix hung task when drain qp failed.

From: hginjgerx

Date: Thu Jun 04 2026 - 07:47:14 EST


From: Chengchang Tang <tangchengchang@xxxxxxxxxx>

The flush CQE is executed asynchronously. If the drain QP has already
triggered the flush CQE, but a HW error occurs during this process,
the driver is unable to detect the flush failure. In this case, the
drain QP thread will wait for the completion signal indefinitely
by using wait_for_completion(), leading to a hung task exception
warning.

Replace wait_for_completion() with wait_for_completion_timeout() to
avoid indefinite waiting.

Fixes: 354e7a6d448b ("RDMA/hns: Support drain SQ and RQ")
Signed-off-by: Chengchang Tang <tangchengchang@xxxxxxxxxx>
Signed-off-by: Junxian Huang <huangjunxian6@xxxxxxxxxxxxx>
---
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 4afd7d6ae3ca..fe3c658d8c08 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -914,6 +914,7 @@ static void handle_drain_completion(struct ib_cq *ibcq,
struct hns_roce_drain_cqe *drain,
struct hns_roce_dev *hr_dev)
{
+#define DRAIN_QP_TMO (HZ * 30)
#define TIMEOUT (HZ / 10)
struct hns_roce_cq *hr_cq = to_hr_cq(ibcq);
unsigned long flags;
@@ -958,8 +959,10 @@ static void handle_drain_completion(struct ib_cq *ibcq,
ibcq->comp_handler(ibcq, ibcq->cq_context);

waiting_done:
- if (ibcq->comp_handler)
- wait_for_completion(&drain->done);
+ if (ibcq->comp_handler) {
+ if (!wait_for_completion_timeout(&drain->done, DRAIN_QP_TMO))
+ ibdev_err_ratelimited(&hr_dev->ib_dev, "Drain qp timeout!\n");
+ }
}

static void hns_roce_v2_drain_rq(struct ib_qp *ibqp)
--
2.33.0