Re: [PATCH] scsi/sg: don't grab scsi host module reference

From: Marc Hartmayer
Date: Wed Jul 05 2023 - 02:14:01 EST


On Wed, Jul 05, 2023 at 10:28 AM +0800, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> 在 2023/07/05 9:43, Yu Kuai 写道:
>> Hi,
>>
>> 在 2023/07/05 1:04, Marc Hartmayer 写道:
>>> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai
>>> <yukuai1@xxxxxxxxxxxxxxx> wrote:
>>>> From: Yu Kuai <yukuai3@xxxxxxxxxx>
>>>>
>>>> In order to prevent request_queue to be freed before cleaning up
>>>> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
>>>> debugfs entries leakage") use scsi_device_get(), however,
>>>> scsi_device_get() will also grab scsi module reference and scsi module
>>>> can't be removed.
>>>>
>>>> It's reported that blktests can't unload scsi_debug after block/001:
>>>>
>>>> blktests (master) # ./check block
>>>> block/001 (stress device hotplugging) [failed]
>>>>       +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
>>>>        Running block/001
>>>>        Stressing sd
>>>>       +modprobe: FATAL: Module scsi_debug is in use.
>>>>
>>>> Fix this problem by grabbing request_queue reference directly, so that
>>>> scsi host module can still be unloaded while request_queue will be
>>>> pinged by sg device.
>>>>
>>>> Reported-by: Chaitanya Kulkarni <chaitanyak@xxxxxxxxxx>
>>>> Link:
>>>> https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@xxxxxxxxxx/
>>>>
>>>> Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
>>>> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
>>>> ---
>>>>   drivers/scsi/sg.c | 6 +++---
>>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>>>> index 2433eeef042a..dcb73787c29d 100644
>>>> --- a/drivers/scsi/sg.c
>>>> +++ b/drivers/scsi/sg.c
>>>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>>>>       int error;
>>>>       unsigned long iflags;
>>>> -    error = scsi_device_get(scsidp);
>>>> +    error = blk_get_queue(scsidp->request_queue);
>>>>       if (error)
>>>>           return error;
>>>> @@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
>>>>   out:
>>>>       if (cdev)
>>>>           cdev_del(cdev);
>>>> -    scsi_device_put(scsidp);
>>>> +    blk_put_queue(scsidp->request_queue);
>>>>       return error;
>>>>   }
>>>> @@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
>>>>        */
>>>>       blk_trace_remove(q);
>>>> -    scsi_device_put(sdp->device);
>>>> +    blk_put_queue(q);
>>>>       write_lock_irqsave(&sg_index_lock, flags);
>>>>       idr_remove(&sg_index_idr, sdp->index);
>>>> --
>>>> 2.39.2
>>>
>>> Hi,
>>>
>>> This change (bisected) triggers a regression in our KVM on s390x CI. The
>>> symptom is that a “scsi_debug device” does not bind to the scsi_generic
>>> driver. On s390x you can reproduce the problem as follows (I have not
>>> tested on x86):
>>>
>>> With this patch applied:
>>>
>>> $ sudo modprobe scsi_debug
>>> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the
>>> scsi_debug device
>>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>>> [0:0:0:0]
>>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No
>>> such file or directory
>>>
>>>
>>> Patch reverted:
>>>
>>
>> I didn't figure out the root cause, howver, have you tried to reviert
>> this patch as well?
>>
>> db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage"
>
> Never mind this, root cause is that the checking of return value of
> blk_get_queue() is wrong.
>
> This shoud be fixed by following patch:
>
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 89fa046c7158..0d8afffd1683 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -1497,9 +1497,10 @@ sg_add_device(struct device *cl_dev)
> int error;
> unsigned long iflags;
>
> - error = blk_get_queue(scsidp->request_queue);
> - if (error)
> - return error;
> + if (!blk_get_queue(scsidp->request_queue)) {
> + pr_warn("%s: get scsi_device queue failed\n", __func__);
> + return -ENODEV;
> + }

Hi Kuai,

I just tried your fix and it works - thanks.

Tested-by: Marc Hartmayer <mhartmay@xxxxxxxxxxxxx>

Marc

[…snip]