Re: [PATCH v2] driver core: Fix bus_type.match() error handling

From: Bart Van Assche
Date: Fri Aug 19 2022 - 20:07:36 EST


On 8/19/22 15:08, Guenter Roeck wrote:
On Fri, Aug 19, 2022 at 01:01:29PM -0700, Bart Van Assche wrote:
Since the issue has been observed in qemu, how about sharing the sysrq-t
output? I recommend to collect that output as follows:
* Send the serial console output to a file. This involves adding
console=ttyS0,115200n8 to the kernel command line and using the proper qemu
options to save the serial console output into a file.
* Reproduce the hang and send the sysrq-t key sequence to qemu, e.g. as
follows: virsh send-key ${vm_name} KEY_LEFTALT KEY_SYSRQ KEY_T

Unless I am missing something, this requires a virtio keyboard.
So far I have been unable to get this to work with qemu arm emulations.

That's unfortunate. Is there another way to collect call traces after
the lockup has happened? Is it sufficient to enable the serial console
and to monitor the serial console output? Is CONFIG_SOFTLOCKUP_DETECTOR=y
sufficient? If not, how about converting the new wait calls in the SCSI
code, e.g. as shown in the (totally untested) patch below?

Thanks,

Bart.


diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 6c63672971f1..edd238384f1d 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -35,6 +35,7 @@
#include <linux/platform_device.h>
#include <linux/pm_runtime.h>
#include <linux/idr.h>
+#include <linux/sched/debug.h>
#include <scsi/scsi_device.h>
#include <scsi/scsi_host.h>
#include <scsi/scsi_transport.h>
@@ -196,7 +197,11 @@ void scsi_remove_host(struct Scsi_Host *shost)
* unloaded and/or the host resources can be released. Hence wait until
* the dependent SCSI targets and devices are gone before returning.
*/
- wait_event(shost->targets_wq, atomic_read(&shost->target_count) == 0);
+ while (wait_event_timeout(shost->targets_wq,
+ atomic_read(&shost->target_count) == 0, 60 * HZ) <= 0) {
+ show_state();
+ show_all_workqueues();
+ }

scsi_mq_destroy_tags(shost);
}
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 213ebc88f76a..1c17b6c53ab0 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -14,6 +14,7 @@
#include <linux/device.h>
#include <linux/pm_runtime.h>
#include <linux/bsg.h>
+#include <linux/sched/debug.h>

#include <scsi/scsi.h>
#include <scsi/scsi_device.h>
@@ -1536,7 +1537,11 @@ static void __scsi_remove_target(struct scsi_target *starget)
* devices associated with @starget have been removed to prevent that
* a SCSI error handling callback function triggers a use-after-free.
*/
- wait_event(starget->sdev_wq, atomic_read(&starget->sdev_count) == 0);
+ while (wait_event_timeout(starget->sdev_wq,
+ atomic_read(&starget->sdev_count) == 0, 60 * HZ) <= 0) {
+ show_state();
+ show_all_workqueues();
+ }
}

/**