Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

From: jianchao.wang
Date: Thu Dec 21 2017 - 23:03:21 EST


Sorry for my non-detailed description.

On 12/21/2017 09:50 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote:
>> It's worrying that even though the blk_mark_rq_complete() here is intended to synchronize with
>> timeout path, but it indeed give the blk_mq_complete_request() the capability to exclude with
There could be scenario where the driver itself stop a request itself with blk_mq_complete_request() or
some other interface that will invoke it, races with the normal completion path where a same request comes.
For example:
a reset could be triggered through sysfs on nvme-rdma
Then the driver will cancel all the reqs, including in-flight ones.
nvme_rdma_reset_ctrl_work()
nvme_rdma_shutdown_ctrl()
>>>>
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl); //quiesce the queue
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl); //invoke blk_mq_complete_request()
nvme_rdma_destroy_io_queues(ctrl, shutdown);
}
>>>>

These operations could race with the normal completion path of in-flight ones.
It should drain all the in-flight ones first here. But there maybe some other
places similar with this.

>> itself. Maybe this capability should be reserved.
>
> Can you explain further how that'd help? The problem there is that if
> you have two competing completions, where one isn't a timeout, there's
> nothing synchronizing the reuse of the request. IOW, the losing on
> can easily complete the next recycle instance. The atomic bitops
> might feel like more protection but it's just feels.

In above case, the request may simultaneously enter requeue and end path.

Thanks
Jianchao