Re: [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout

From: Sagi Grimberg
Date: Mon Apr 14 2025 - 18:28:26 EST




On 10/04/2025 11:51, Mohamed Khalfella wrote:
On 2025-03-24 13:07:58 +0100, Daniel Wagner wrote:
The TP4129 mendates that the failover should be delayed by CQT. Thus when
nvme_decide_disposition returns FAILOVER do not immediately re-queue it on
the namespace level instead queue it on the ctrl's request_list and
moved later to the namespace's requeue_list.

Signed-off-by: Daniel Wagner <wagi@xxxxxxxxxx>
---
drivers/nvme/host/core.c | 19 ++++++++++++++++
drivers/nvme/host/fc.c | 4 ++++
drivers/nvme/host/multipath.c | 52 ++++++++++++++++++++++++++++++++++++++++---
drivers/nvme/host/nvme.h | 15 +++++++++++++
drivers/nvme/host/rdma.c | 2 ++
drivers/nvme/host/tcp.c | 1 +
6 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 135045528ea1c79eac0d6d47d5f7f05a7c98acc4..f3155c7735e75e06c4359c26db8931142c067e1d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -239,6 +239,7 @@ static void nvme_do_delete_ctrl(struct nvme_ctrl *ctrl)
flush_work(&ctrl->reset_work);
nvme_stop_ctrl(ctrl);
+ nvme_flush_failover(ctrl);
nvme_remove_namespaces(ctrl);
ctrl->ops->delete_ctrl(ctrl);
nvme_uninit_ctrl(ctrl);
@@ -1310,6 +1311,19 @@ static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl)
queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
}
+void nvme_schedule_failover(struct nvme_ctrl *ctrl)
+{
+ unsigned long delay;
+
+ if (ctrl->cqt)
+ delay = msecs_to_jiffies(ctrl->cqt);
+ else
+ delay = ctrl->kato * HZ;
I thought that delay = m * ctrl->kato + ctrl->cqt
where m = ctrl->ctratt & NVME_CTRL_ATTR_TBKAS ? 3 : 2
no?

This was said before, but if we are going to always start waiting for kato for failover purposes,
we first need a patch that prevent kato from being arbitrarily long.

Lets cap kato to something like 10 seconds (which is 2x the default which apparently no one is touching).