Re: [PATCH] nvme: hwmon: fix crash on device teardown

From: Hannes Reinecke
Date: Mon Jan 11 2021 - 11:01:08 EST


On 1/5/21 10:45 AM, Daniel Wagner wrote:
On Mon, Jan 04, 2021 at 06:06:10PM -0300, Enzo Matsumiya wrote:
@Daniel maybe try tweaking your tests to use a smaller controller
loss timeout (-l option)? I do this on my tests because the default
value kicks in about 30min after hot-removal -- i.e. you
have to actually wait for the timeout to expire to trigger the bug.

As far I can tell, the blktests test I am using will trigger the same
bug. The problem is that the lifetime of hwmon sysfs entry should be
aligned to the lifetime of the nvme sysfs entry. Currently, hwmon's
lifetime is bound to the lifetime of the ctl sysfs entry. When the nvme
entry goes away (and obviously also the underlying device), the hwmon
sysfs entry still references it.

Yeah, using the controller node for devm allocations is quite dodgy.
Does this one help?


diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c
index 6fdd07fb3001..7260af028cf7 100644
--- a/drivers/nvme/host/hwmon.c
+++ b/drivers/nvme/host/hwmon.c
@@ -226,7 +226,7 @@ static const struct hwmon_chip_info

int nvme_hwmon_init(struct nvme_ctrl *ctrl)
{
- struct device *dev = ctrl->dev;
+ struct device *dev = ctrl->device;
struct nvme_hwmon_data *data;
struct device *hwmon;
int err;
@@ -240,8 +240,7 @@ int nvme_hwmon_init(struct nvme_ctrl *ctrl)

err = nvme_hwmon_get_smart_log(data);
if (err) {
- dev_warn(ctrl->device,
- "Failed to read smart log (error %d)\n", err);
+ dev_warn(dev, "Failed to read smart log (error %d)\n", err);
devm_kfree(dev, data);
return err;
}


Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer