Re: [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter when freeing rqs

From: John Garry
Date: Mon Jan 04 2021 - 13:45:52 EST


On 04/01/2021 17:22, Bart Van Assche wrote:
On 1/4/21 7:33 AM, John Garry wrote:
On 23/12/2020 15:47, Bart Van Assche wrote:
I propose to change the order in which blk_mq_sched_free_requests(q) and
blk_mq_debugfs_unregister(q) are called. Today blk_mq_sched_free_requests(q)
is called by blk_cleanup_queue() before blk_put_queue() is called.
blk_put_queue() calls blk_release_queue() if the last reference is dropped.
blk_release_queue() calls blk_mq_debugfs_unregister(). I prefer removing the
debugfs attributes earlier over modifying the tag iteration functions
because I think removing the debugfs attributes earlier is less risky.
But don't we already have this following path to remove the per-hctx debugfs
dir earlier than blk_mq_sched_free_requests() or blk_release_queue():

blk_cleanup_queue() -> blk_mq_exit_queue() -> blk_mq_exit_hw_queues() ->
blk_mq_debugfs_unregister_hctx() ->
blk_mq_debugfs_unregister_hctx(hctx->debugfs_dir)

Having said that, I am not sure how this is related directly to the problem
I mentioned. In that problem, above, we trigger the
blk_mq_tagset_busy_iter() from the SCSI host sysfs file, and the
use-after-free comes about from disabling the elevator (and freeing the
sched requests) in parallel.
Hi John,

Hi Bart,


Right, what I proposed is unrelated to the use-after-free triggered by
disabling I/O scheduling.

Regarding the races triggered by disabling I/O scheduling: can these be
fixed by quiescing all request queues associated with a tag set before
changing the I/O scheduler instead of only the request queue for which the
I/O scheduler is changed? I think we already do this before updating the
number of hardware queues.

For changing the number of HW queues, I figure that we have no choice but to quiesce all request queues associated.

But maybe there is still some locking we could use to avoid that here.

Please let me consider it a bit more.

Thanks,
John