Re: CONFIG_DEBUG_TEST_DRIVER_REMOVE causes unremovable drivers to bind devices twice

From: Rob Herring
Date: Tue Oct 11 2016 - 11:28:28 EST


+Bjorn

On Mon, Oct 10, 2016 at 8:33 AM, Rob Herring <robh@xxxxxxxxxx> wrote:
> On Mon, Oct 10, 2016 at 8:17 AM, Laszlo Ersek <lersek@xxxxxxxxxx> wrote:
>> Hi,
>>
>> Greg asked me to stick to email with this bug report, so I'm reposting
>> the original kernel bugzilla report to personal addresses, and lkml.
>>
>> Thanks,
>> Laszlo
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=177021
>>
>> Bug ID: 177021
>> Summary: [driver core] CONFIG_DEBUG_TEST_DRIVER_REMOVE causes
>> unremovable drivers to bind devices twice
>> Product: Drivers
>> Version: 2.5
>> Kernel Version: v4.8-2283-ga3443cd (4.9.0-0.rc0.git2.1.fc26.aarch64)
>> Hardware: All
>> OS: Linux
>> Tree: Mainline
>> Status: NEW
>> Severity: normal
>> Priority: P1
>> Component: Other
>> Assignee: drivers_other@xxxxxxxxxxxxxxxxxxxx
>> Reporter: lersek@xxxxxxxxxx
>> CC: arnd@xxxxxxxx, greg@xxxxxxxxx
>> Regression: No
>>
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE was added in the following commit:
>>
>>> commit bea5b158ff0da9c7246ff391f754f5f38e34577a
>>> Author: Rob Herring <robh@xxxxxxxxxx>
>>> Date: Thu Aug 11 10:20:58 2016 -0500
>>>
>>> driver core: add test of driver remove calls during probe
>
> [...]
>
>> This is almost a regression because the kernel crashes with valid
>> drivers. It is not an error for a driver to not provide a remove()
>> callback, so in this instance CONFIG_DEBUG_TEST_DRIVER_REMOVE does not
>> expose a driver bug, it breaks with a valid driver. Not a regression for
>> the upstream kernel after all, because the Kconfig documentation
>> suggests N as default.
>>
>> Proposed solution: if none of the remove() methods exist, or the
>> remove() method that exists fails, then don't release any resources, and
>> don't re-probe the device.
>
> I was thinking no remove method meant the driver didn't need to do any
> explicit clean-up as all resources used devres, but I guess that's not
> going to cover things like subsystem de-registration. I'll prepare a
> fix.

Looking at this some more, I think this should just be keyed off of
suppress_bind_attr. If userspace provides bind/unbind for a driver,
then remove and re-probe should work even if the driver doesn't have a
remove function.

Either the generic PCI host needs to set suppress_bind_attr like many
of the ARM-based PCI host drivers already do or the remove path should
be fixed to support this. Getting PCI hosts to be hot plug-able is a
goal (or maybe supported now?).

Rob