Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq

From: Yanjun.Zhu

Date: Tue Mar 17 2026 - 16:20:11 EST



On 3/17/26 12:31 PM, Yanjun.Zhu wrote:

On 3/17/26 12:03 PM, Leon Romanovsky wrote:
On Tue, Mar 17, 2026 at 10:24:11AM -0700, Yanjun.Zhu wrote:
On 3/17/26 7:38 AM, Zhu Yanjun wrote:
在 2026/3/16 13:13, Leon Romanovsky 写道:
On Fri, Mar 13, 2026 at 04:40:23PM +0100, Marco Crivellari wrote:
This patch continues the effort to refactor workqueue APIs,
which has begun
with the changes introducing new workqueues and a new
alloc_workqueue flag:

     commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and
system_dfl_wq")
     commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default
behavior of
workqueues to become unbound by default so that their workload
placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the
better named
new workqueues with no intended behaviour changes:

     system_wq -> system_percpu_wq
     system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq,
system_unbound_wq) can be
removed in the future.
I recall earlier efforts to replace system workqueues with
per‑driver queues,
because unloading a driver forces a flush of the entire system
workqueue,
which is undesirable for overall system behavior.

Wouldn't it be better to introduce a local workqueue here and use
that instead?
Thanks.

1.The initialization should be:

my_wq = alloc_workqueue("my_driver_queue", WQ_UNBOUND | WQ_MEM_RECLAIM,
0);
if (!my_wq)
     return -ENOMEM;

2. The Submission should be:

queue_work(my_wq, &my_work);

3. Destroy should be:

destroy_workqueue()

Thanks,
Zhu Yanjun
Hi, Leon

The diff for a new work queue in rxe is as below. Please review it.
I'm not sure that you need second workqueue and destroy_workqueue
already does flush_workqueue. There is no need to call it explicitly.
flush_workqueue() can be removed.

The introduction of the second workqueue is due to rxe_wq being heavily utilized by QP tasks.

The additional workqueue helps offload and distribute the workload, preventing rxe_wq from becoming a bottleneck.

If you believe that the workload on rxe_wq is not significant, I can simplify the design

by removing the second workqueue and using rxe_wq for all work items instead.

Zhu Yanjun

Hi, Leon

The latest commit is as below:

diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index bc11b1ec59ac..98092dcc1870 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
         work->frags[i].mr = mr;
     }

-    queue_work(system_unbound_wq, &work->work);
+    rxe_queue_work(&work->work);

     return 0;

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index f522820b950c..0131829b5641 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -6,11 +6,13 @@

 #include "rxe.h"

+/* work for rxe_task */
 static struct workqueue_struct *rxe_wq;

 int rxe_alloc_wq(void)
 {
-    rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
+    rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
+                WQ_MAX_ACTIVE);
     if (!rxe_wq)
         return -ENOMEM;

@@ -254,6 +256,13 @@ void rxe_sched_task(struct rxe_task *task)
     spin_unlock_irqrestore(&task->lock, flags);
 }

+/* Helper to queue auxiliary tasks into rxe_wq.
+ */
+void rxe_queue_work(struct work_struct *work)
+{
+    queue_work(rxe_wq, work);
+}
+
 /* rxe_disable/enable_task are only called from
  * rxe_modify_qp in process context. Task is moved
  * to the drained state by do_task.
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index a8c9a77b6027..60c085cc11a7 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -36,6 +36,7 @@ int rxe_alloc_wq(void);

 void rxe_destroy_wq(void);

+void rxe_queue_work(struct work_struct *work);
 /*
  * init rxe_task structure
  *    qp  => parameter to pass to func



Thanks


diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c
b/drivers/infiniband/sw/rxe/rxe_odp.c
index bc11b1ec59ac..03199fef47fb 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
          work->frags[i].mr = mr;
      }

-    queue_work(system_unbound_wq, &work->work);
+    rxe_queue_aux_work(&work->work);

      return 0;

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c
b/drivers/infiniband/sw/rxe/rxe_task.c
index f522820b950c..a2da699b969e 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -6,19 +6,36 @@

  #include "rxe.h"

+/* work for rxe_task */
  static struct workqueue_struct *rxe_wq;

+/* work for other rxe jobs */
+static struct workqueue_struct *rxe_aux_wq;
+
  int rxe_alloc_wq(void)
  {
-    rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
+    rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
+                WQ_MAX_ACTIVE);
      if (!rxe_wq)
          return -ENOMEM;

+    rxe_aux_wq = alloc_workqueue("rxe_aux_wq",
+                WQ_UNBOUND | WQ_MEM_RECLAIM, WQ_MAX_ACTIVE);
+    if (!rxe_aux_wq) {
+        destroy_workqueue(rxe_wq);
+        return -ENOMEM;
+
+    }
+
      return 0;
  }

  void rxe_destroy_wq(void)
  {
+    flush_workqueue(rxe_aux_wq);
+    destroy_workqueue(rxe_aux_wq);
+
+    flush_workqueue(rxe_wq);
      destroy_workqueue(rxe_wq);
  }

@@ -254,6 +271,14 @@ void rxe_sched_task(struct rxe_task *task)
      spin_unlock_irqrestore(&task->lock, flags);
  }

+/* rxe_wq for rxe tasks. rxe_aux_wq for other rxe jobs.
+ */
+void rxe_queue_aux_work(struct work_struct *work)
+{
+    WARN_ON_ONCE(!rxe_aux_wq);
+    queue_work(rxe_aux_wq, work);
+}
+
  /* rxe_disable/enable_task are only called from
   * rxe_modify_qp in process context. Task is moved
   * to the drained state by do_task.
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h
b/drivers/infiniband/sw/rxe/rxe_task.h
index a8c9a77b6027..e1c0a34808b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -36,6 +36,7 @@ int rxe_alloc_wq(void);

  void rxe_destroy_wq(void);

+void rxe_queue_aux_work(struct work_struct *work);
  /*
   * init rxe_task structure
   *    qp  => parameter to pass to func

Zhu Yanjun

Thanks

Link:
https://lore.kernel.org/all/20250221112003.1dSuoGyc@xxxxxxxxxxxxx/
Suggested-by: Tejun Heo <tj@xxxxxxxxxx>
Signed-off-by: Marco Crivellari <marco.crivellari@xxxxxxxx>
---
   drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c
b/drivers/infiniband/sw/rxe/rxe_odp.c
index bc11b1ec59ac..d440c8cbaea5 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct
ib_pd *ibpd,
           work->frags[i].mr = mr;
       }
   -    queue_work(system_unbound_wq, &work->work);
+    queue_work(system_dfl_wq, &work->work);
         return 0;
   --
2.53.0