Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

From: Li Zhong
Date: Tue Apr 15 2014 - 21:42:02 EST


On Tue, 2014-04-15 at 10:50 -0400, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 15, 2014 at 10:44:37AM +0800, Li Zhong wrote:
> > / *
> > * This process might deadlock with another process trying to
> > * remove this device:
> > * This process holding the s_active of "online" attribute, and tries
> > * to online/offline the device with some locks protecting hotplug.
> > * Device removing process holding some locks protecting hotplug, and
> > * tries to remove the "online" attribute, waiting for the s_active to
> > * be released.
> > *
> > * The deadlock described above should be solved with
> > * lock_device_hotplug_sysfs(). We temporarily drop the active
> > * protection here to avoid some lockdep warnings.
> > *
> > * If device_hotplug_lock is forgotten to be used when removing
> > * device(possibly some very simple device even don't need this lock?),
> > * @dev could go away any time after dropping the active protection.
> > * So increase its ref count before dropping active protection.
> > * Though invoking device_{on|off}line() on a removed device seems
> > * unreasonable, it should be less disastrous than playing with freed
> > * @dev. Also, we might be able to have some mechanism abort
> > * device_{on|off}line() if @dev already removed.
> > */
>
> Hmmm... I'm not sure I fully understand the problem. Does the code
> ever try to remove "online" while holding cpu_add_remove_lock and,
> when written 0, online knob grabs cpu_add_remove_lock?

Yes.

In acpi_processor_remove(), cpu_maps_update_begin() is called to hold
cpu_add_remove_lock, and then arch_unregister_cpu calls
unregister_cpu(), which will try to remove dir cpu1 including "online".

while written 0 to online, cpu_down() will also try to grab
cpu_add_remove_lock with cpu_maps_update_begin().

> If so, that is
> an actually possible deadlock, no?

Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
lock_device_hotplug_sysfs() to return a restart syscall error if not
able to try lock the device_hotplug_lock. That also requires the device
removing code path to take the device_hotplug_lock.

Thanks, Zhong

>
> Thanks.
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/