Re: [PATCH] scsi/sg: don't grab scsi host module reference

From: Yu Kuai
Date: Tue Jul 04 2023 - 22:29:00 EST


Hi,

在 2023/07/05 9:43, Yu Kuai 写道:
Hi,

在 2023/07/05 1:04, Marc Hartmayer 写道:
On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

In order to prevent request_queue to be freed before cleaning up
blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
debugfs entries leakage") use scsi_device_get(), however,
scsi_device_get() will also grab scsi module reference and scsi module
can't be removed.

It's reported that blktests can't unload scsi_debug after block/001:

blktests (master) # ./check block
block/001 (stress device hotplugging) [failed]
      +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
       Running block/001
       Stressing sd
      +modprobe: FATAL: Module scsi_debug is in use.

Fix this problem by grabbing request_queue reference directly, so that
scsi host module can still be unloaded while request_queue will be
pinged by sg device.

Reported-by: Chaitanya Kulkarni <chaitanyak@xxxxxxxxxx>
Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@xxxxxxxxxx/
Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
  drivers/scsi/sg.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 2433eeef042a..dcb73787c29d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
      int error;
      unsigned long iflags;
-    error = scsi_device_get(scsidp);
+    error = blk_get_queue(scsidp->request_queue);
      if (error)
          return error;
@@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
  out:
      if (cdev)
          cdev_del(cdev);
-    scsi_device_put(scsidp);
+    blk_put_queue(scsidp->request_queue);
      return error;
  }
@@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
       */
      blk_trace_remove(q);
-    scsi_device_put(sdp->device);
+    blk_put_queue(q);
      write_lock_irqsave(&sg_index_lock, flags);
      idr_remove(&sg_index_idr, sdp->index);
--
2.39.2

Hi,

This change (bisected) triggers a regression in our KVM on s390x CI. The
symptom is that a “scsi_debug device” does not bind to the scsi_generic
driver. On s390x you can reproduce the problem as follows (I have not
tested on x86):

With this patch applied:

$ sudo modprobe scsi_debug
$ # Get the 'scsi_host,channel,target_number,LUN' tuple for the scsi_debug device
$ lsscsi |grep scsi_debug |awk '{ print $1 }'
[0:0:0:0]
$ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No such file or directory


Patch reverted:


I didn't figure out the root cause, howver, have you tried to reviert
this patch as well?

db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage"

Never mind this, root cause is that the checking of return value of
blk_get_queue() is wrong.

This shoud be fixed by following patch:

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 89fa046c7158..0d8afffd1683 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1497,9 +1497,10 @@ sg_add_device(struct device *cl_dev)
int error;
unsigned long iflags;

- error = blk_get_queue(scsidp->request_queue);
- if (error)
- return error;
+ if (!blk_get_queue(scsidp->request_queue)) {
+ pr_warn("%s: get scsi_device queue failed\n", __func__);
+ return -ENODEV;
+ }

error = -ENOMEM;
cdev = cdev_alloc();


Thanks,
Kuai
$ sudo modprobe scsi_debug
$ lsscsi |grep scsi_debug |awk '{ print $1 }'
[0:0:0:0]
$ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
   File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
   Size: 0             Blocks: 0          IO Block: 4096   directory
Device: 0,20    Inode: 12155       Links: 3


Any ideas?

  Marc
.


.