Re: [PATCH v1] ACPI / scan: Acquire device_hotplug_lock in acpi_scan_init()

From: Michal Hocko
Date: Thu Jul 25 2019 - 15:19:49 EST


On Thu 25-07-19 16:35:07, David Hildenbrand wrote:
> On 25.07.19 15:57, Michal Hocko wrote:
> > On Thu 25-07-19 15:05:02, David Hildenbrand wrote:
> >> On 25.07.19 14:56, Michal Hocko wrote:
> >>> On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
> >>>> We end up calling __add_memory() without the device hotplug lock held.
> >>>> (I used a local patch to assert in __add_memory() that the
> >>>> device_hotplug_lock is held - I might upstream that as well soon)
> >>>>
> >>>> [ 26.771684] create_memory_block_devices+0xa4/0x140
> >>>> [ 26.772952] add_memory_resource+0xde/0x200
> >>>> [ 26.773987] __add_memory+0x6e/0xa0
> >>>> [ 26.775161] acpi_memory_device_add+0x149/0x2b0
> >>>> [ 26.776263] acpi_bus_attach+0xf1/0x1f0
> >>>> [ 26.777247] acpi_bus_attach+0x66/0x1f0
> >>>> [ 26.778268] acpi_bus_attach+0x66/0x1f0
> >>>> [ 26.779073] acpi_bus_attach+0x66/0x1f0
> >>>> [ 26.780143] acpi_bus_scan+0x3e/0x90
> >>>> [ 26.780844] acpi_scan_init+0x109/0x257
> >>>> [ 26.781638] acpi_init+0x2ab/0x30d
> >>>> [ 26.782248] do_one_initcall+0x58/0x2cf
> >>>> [ 26.783181] kernel_init_freeable+0x1bd/0x247
> >>>> [ 26.784345] kernel_init+0x5/0xf1
> >>>> [ 26.785314] ret_from_fork+0x3a/0x50
> >>>>
> >>>> So perform the locking just like in acpi_device_hotplug().
> >>>
> >>> While playing with the device_hotplug_lock, can we actually document
> >>> what it is protecting please? I have a bad feeling that we are adding
> >>> this lock just because some other code path does rather than with a good
> >>> idea why it is needed. This patch just confirms that. What exactly does
> >>> the lock protect from here in an early boot stage.
> >>
> >> We have plenty of documentation already
> >>
> >> mm/memory_hotplug.c
> >>
> >> git grep -C5 device_hotplug mm/memory_hotplug.c
> >>
> >> Also see
> >>
> >> Documentation/core-api/memory-hotplug.rst
> >
> > OK, fair enough. I was more pointing to a documentation right there
> > where the lock is declared because that is the place where people
> > usually check for documentation. The core-api documentation looks quite
> > nice. And based on that doc it seems that this patch is actually not
> > needed because neither the online/offline or cpu hotplug should be
> > possible that early unless I am missing something.
>
> I really prefer to stick to locking rules as outlined on the
> interfaces if it doesn't hurt. Why it is not needed is not clear.
>
> >
> >> Regarding the early stage: primarily lockdep as I mentioned.
> >
> > Could you add a lockdep splat that would be fixed by this patch to the
> > changelog for reference?
> >
>
> I have one where I enforce what's documented (but that's of course not
> upstream and therefore not "real" yet)

Then I suppose to not add locking for something that is not a problem.
Really, think about it. People will look at this code and follow the
lead without really knowing why the locking is needed.
device_hotplug_lock has its purpose and if the code in question doesn't
need synchronization for the documented scenarios then the locking
simply shouldn't be there. Adding the lock just because of a
non-existing, and IMHO dubious, lockdep splats is just wrong.

We need to rationalize the locking here, not to add more hacks.

--
Michal Hocko
SUSE Labs