Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags

From: John Garry
Date: Mon Oct 18 2021 - 05:31:42 EST


On 18/10/2021 10:07, Ming Lei wrote:
On Mon, Oct 18, 2021 at 09:08:57AM +0100, John Garry wrote:
On 13/10/2021 16:13, John Garry wrote:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 72a2724a4eee..2a2ad6dfcc33 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap,
unsigned int bitnr, void *data)
      if (!rq)
          return true;
-    if (rq->q == hctx->queue && rq->mq_hctx == hctx)
-        ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+    if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
+                blk_mq_is_shared_tags(hctx->flags)))
+        ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved);
      blk_mq_put_rq_ref(rq);
      return ret;
  }
@@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct
request_queue *q, busy_iter_fn *fn,
          if (tags->nr_reserved_tags)
              bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
          bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+
+        if (blk_mq_is_shared_tags(hctx->flags))
+            break;
      }
      blk_queue_exit(q);
  }

I suppose that is ok, and means that we iter once.

However, I have to ask, where is the big user of
blk_mq_queue_tag_busy_iter() coming from? I saw this from Kashyap's
mail:

> 1.31%     1.31%  kworker/57:1H-k  [kernel.vmlinux]
>       native_queued_spin_lock_slowpath
>       ret_from_fork
>       kthread
>       worker_thread
>       process_one_work
>       blk_mq_timeout_work
>       blk_mq_queue_tag_busy_iter
>       bt_iter
>       blk_mq_find_and_get_req
>       _raw_spin_lock_irqsave
>       native_queued_spin_lock_slowpath

How or why blk_mq_timeout_work()?
Just some update: I tried hisi_sas with 10x SAS SSDs, megaraid sas with 1x
SATA HDD (that's all I have), and null blk with lots of devices, and I still
can't see high usage of blk_mq_queue_tag_busy_iter().
It should be triggered easily in case of heavy io accounting:

while true; do cat /proc/diskstats; done


Let me check that.


So how about we get this patch processed (to fix blk_mq_tagset_busy_iter()),
as it is independent of blk_mq_queue_tag_busy_iter()? And then wait for some
update or some more info from Kashyap regarding blk_mq_queue_tag_busy_iter()
Looks fine:

Reviewed-by: Ming Lei<ming.lei@xxxxxxxxxx>

Thanks, I'll just send a v2 with your tag for clarity, as there has been much discussion here.

John