Re: [PATCH] nvme: hwmon: fix crash on device teardown

From: Daniel Wagner
Date: Wed Dec 30 2020 - 08:24:47 EST


Hi Chaitanya,

On Thu, Dec 10, 2020 at 04:35:14AM +0000, Chaitanya Kulkarni wrote:
> Enzo,
> On 12/9/20 13:39, Enzo Matsumiya wrote:
> > Fix a possible NULL pointer dereference when trying to read
> > hwmon sysfs entries associated to NVMe-oF devices that were
> > hot-removed or disconnected.
> >
> > Unregister the NVMe hwmon device upon controller teardown
> > (nvme_stop_ctrl()).
> >
> > Signed-off-by: Enzo Matsumiya <ematsumiya@xxxxxxx>
> (Without looking into the code)
>
> Do you have blktests testcases for NVMeOFhot-removed and disconnected
> device scenario to produce the problem ?

Here is a rough blktests' test to reproduce the problem. There is
clearly a race as the crash not always happens at the same iteration. On
my system I was able to hit consistently within 50 iterations. Maybe you
need to tweak it on your system to reproduce it.

Thanks,
Daniel



#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
# Copyright (c) 2020 SUSE LLC
#
# Test NVMeOF read hwmon device info
. tests/nvme/rc

DESCRIPTION="read hwmon device info"

requires() {
_nvme_requires
_have_modules loop
_require_nvme_trtype_is_fabrics
}

_get_nvme_hwmon_devs() {
echo "/sys/devices/virtual/nvme-fabrics/ctl/hwmon*"
}

_read_nvme_hwmon_attr() {
hwmon_devs="$(_get_nvme_hwmon_devs)"
for hwmon in "${hwmon_devs}"; do
cat ${hwmon}"/name"
cat ${hwmon}"/temp1_label"
cat ${hwmon}"/temp1_alarm"
cat ${hwmon}"/temp1_input"
done
}

test() {
echo "Running ${TEST_NAME}"

_setup_nvmet

local port
local loop_dev
local file_path="$TMPDIR/img"
local subsys_name="blktests-subsystem-1"

truncate -s 1G "${file_path}"

loop_dev="$(losetup -f --show "${file_path}")"

_create_nvmet_subsystem "${subsys_name}" "${loop_dev}" \
"91fdba0d-f87b-4c25-b80f-db7be1418b9e"
port="$(_create_nvmet_port "${nvme_trtype}")"
_add_nvmet_subsys_to_port "${port}" "${subsys_name}"

for i in $(seq 1 50); do
_nvme_connect_subsys "${nvme_trtype}" "${subsys_name}"
_read_nvme_hwmon_attr
_nvme_disconnect_subsys "${subsys_name}"
_read_nvme_hwmon_attr
done

_remove_nvmet_subsystem_from_port "${port}" "${subsys_name}"
_remove_nvmet_subsystem "${subsys_name}"
_remove_nvmet_port "${port}"

losetup -d "${loop_dev}"

rm "${file_path}"

echo "Test complete"
}