Re: [patch] blk-mq: avoid excessive boot delays with large lun counts

From: Jeff Moyer
Date: Thu Oct 29 2015 - 11:18:47 EST

Next message: Simmons, James A.: "RE: [lustre-devel] [PATCH 08/10] staging: lustre: remove white space in libcfs_hash.h"
Previous message: Andrew Lunn: "Re: [RFC PATCH v3 2/5] net: dsa: bcm_sf2: cleanup resources in remove callback"
In reply to: Ming Lei: "Re: [patch] blk-mq: avoid excessive boot delays with large lun counts"
Next in thread: Ming Lei: "Re: [patch] blk-mq: avoid excessive boot delays with large lun counts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Ming Lei <tom.leiming@xxxxxxxxx> writes:

> Looks we should have cleared the TAG_SHARED flag during
> blk_mq_init_hctx() and just let blk_mq_update_tag_set_depth()
> deal with that, then the race can be avoided.

The whole point of the patch set is to propagate the flag up to the tag
set so that we can avoid iterating all hctxs in all queues.

>> At this point, neither queue's hctxs have the shared flag set. Next,
>> both will race to get the tag_list_lock for the tag_set inside of
>> blk_mq_add_queue_tag_set. Only one will win and mark the initial
>> queue's hctx's as shared (as well as its own). Then, when the second
>> queue gets the lock, it will find that the shared tag is already set,
>> and assume that it doesn't have to do anything. But, because its
>
> As I suggested, we can set it always in case that TAG_SHARED
> is set in set->flags because we know the queue isn't ready yet at that
> time.

I see. You are suggesting that I just get rid of the conditional. We
could do that, but you will get the exact same result as what I posted.
I'm not sure why you would prefer that over the explicit check. With
the patch I posted, we can avoid walking the list of hctxs a second
time.

Anyway, here's a patch that I think implements your suggestion. I
prefer the original, but this should achieve the same exact result.
Let me know if I've misunderstood.

Cheers,
Jeff

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 85f0143..7bf717a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1860,27 +1860,26 @@ static void blk_mq_map_swqueue(struct request_queue *q,
}
}

-static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set)
+static void queue_set_hctx_shared(struct request_queue *q, bool shared)
{
struct blk_mq_hw_ctx *hctx;
- struct request_queue *q;
- bool shared;
int i;

- if (set->tag_list.next == set->tag_list.prev)
- shared = false;
- else
- shared = true;
+ queue_for_each_hw_ctx(q, hctx, i) {
+ if (shared)
+ hctx->flags |= BLK_MQ_F_TAG_SHARED;
+ else
+ hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
+ }
+}
+
+static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set, bool shared)
+{
+ struct request_queue *q;

list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_freeze_queue(q);
-
- queue_for_each_hw_ctx(q, hctx, i) {
- if (shared)
- hctx->flags |= BLK_MQ_F_TAG_SHARED;
- else
- hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
- }
+ queue_set_hctx_shared(q, shared);
blk_mq_unfreeze_queue(q);
}
}
@@ -1891,7 +1890,13 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)

mutex_lock(&set->tag_list_lock);
list_del_init(&q->tag_set_list);
- blk_mq_update_tag_set_depth(set);
+
+ if (set->tag_list.next == set->tag_list.prev) {
+ /* just transitioned to unshared */
+ set->flags &= ~BLK_MQ_F_TAG_SHARED;
+ /* update existing queue */
+ blk_mq_update_tag_set_depth(set, false);
+ }
mutex_unlock(&set->tag_list_lock);
}

@@ -1902,7 +1907,21 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,

mutex_lock(&set->tag_list_lock);
list_add_tail(&q->tag_set_list, &set->tag_list);
- blk_mq_update_tag_set_depth(set);
+
+ if (set->tag_list.next != set->tag_list.prev) {
+ /*
+ * Only update the tag set state if the state has
+ * actually changed.
+ */
+ if (!(set->flags & BLK_MQ_F_TAG_SHARED)) {
+ /* just transitioned to shared tags */
+ set->flags |= BLK_MQ_F_TAG_SHARED;
+ blk_mq_update_tag_set_depth(set, true);
+ } else {
+ /* ensure we didn't race with another addition */
+ queue_set_hctx_shared(q, true);
+ }
+ }
mutex_unlock(&set->tag_list_lock);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Simmons, James A.: "RE: [lustre-devel] [PATCH 08/10] staging: lustre: remove white space in libcfs_hash.h"
Previous message: Andrew Lunn: "Re: [RFC PATCH v3 2/5] net: dsa: bcm_sf2: cleanup resources in remove callback"
In reply to: Ming Lei: "Re: [patch] blk-mq: avoid excessive boot delays with large lun counts"
Next in thread: Ming Lei: "Re: [patch] blk-mq: avoid excessive boot delays with large lun counts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]