Re: [PATCH RFT] blk-mq: optimize queue tag busy iter for shared_tags

From: John Garry
Date: Tue Dec 21 2021 - 10:19:49 EST


Hi Kashyap,

This is for current/5.17. This patch is meaningfully only on top of [1].

[1] " blk-mq: Use shared tags for shared sbitmap support" Commit -
e155b0c238b20f0a866f4334d292656665836c8a


But your change seems effectively the same as in https://lore.kernel.org/all/1638794990-137490-4-git-send-email-john.garry@xxxxxxxxxx/, which is now merged in Jens' 5.17 queue:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-5.17/block&id=fea9f92f1748083cb82049ed503be30c3d3a9b69

While doing additional testing for [1], I noticed some performance issue.
Along with the performance issue, I noticed CPU lockup as well. Lockup
trace -

_raw_spin_lock_irqsave+0x42/0x50
blk_mq_find_and_get_req+0x20/0xa0
bt_iter+0x2d/0x80
blk_mq_queue_tag_busy_iter+0x1aa/0x2f0
? blk_mq_complete_request+0x30/0x30
? blk_mq_complete_request+0x30/0x30
? __schedule+0x360/0x850
blk_mq_timeout_work+0x5e/0x120
process_one_work+0x1a8/0x380
worker_thread+0x30/0x380
? wq_calc_node_cpumask.isra.30+0x100/0x100
kthread+0x167/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30

It is a generic performance issue if driver use " shost->host_tagset = 1".
In fact, I found that [1] is useful to fix performance issue and provided
this additional patch.

I changed my setup to have 64 scsi_devices (earlier I just kept 16 or 24
drives, so did not noticed this issue). Performance/cpu lockup issue is not
due to [1].
More number of scsi device, hardware context per host and high queue depth
will increase the chances of lockup and performance drop.

Do you think, it is good to have changes in 5.16 + stable ?
I don't know if this patch will create any side effect. Can you review and
let me know your feedback. ?


Can you test my merged change again for this scenario?

I will also note that I mentioned previously that blk_mq_queue_tag_busy_iter() was not optimum for shared sbitmap, i.e. before shared tags, but no one said performance was bad for shared sbitmap.

Thanks,
John