Re: ACPI CPU hot removal: Problem with automatic scripting in response to hotplug events

From: Pavel Machek
Date: Mon Jan 28 2008 - 11:14:55 EST


On Fri 2008-01-18 10:13:01, Daniel Arai wrote:
> I have an x86_64 system that's capable of doing ACPI CPU
> hot removal.
> One issue that's worrying me right now is that there
> isn't enough
> information to automatically script CPU removal when
> it's requested by
> the hardware platform.
>
> I'm doing my testing with 2.6.24-rc8 from kernel.org. I
> have applied
> the patch series from
> http://bugzilla.kernel.org/show_bug.cgi?id=2884
> to avoid running into deadlocks on CPU removal. This
> issue is almost
> certainly independent of these patches.
>
> /sbin/hotplug does get invoked for CPU ejection requests
> that are sent
> from the platform. It only gets a pointer to the ACPI
> /sys node
> describing the ACPI processor object for which ejection
> has been
> requested. There doesn't seem to be a way to map from
> this device
> node to a CPU number so the processor can be taken
> offline before

Submit a patch, I guess you are first one with physical hotplug
capable system.

> 3. Provide a symbolic link from the ACPI device tree to
> the
> appropriate CPU device, and possibly the reverse link as
> well. This
> would enable /sbin/hotplug to find the appropriate CPU
> node to take
> the CPU offline by itself, and would also be usefil for
> a system
> administrator who needs to know the mapping from ACPI
> device to CPU
> number. What I don't know is how hard it is to
> construct these new
> nodes. It also gets slightly messy because ACPI
> processor objects can
> appear at different levels in the /sys node hierarchy
> depending on how
> they're represented in the ACPI tables, so constructing
> relative
> symlinks isn't a trivial matter.
>
> I think that overall the third approach seems the most
> appealing,
> because it provides additional information to the system
> administrator
> without adding special information just there for
> /sbin/hotplug. I
> think #1 is probably a nice thing to have just in
> general, though I
> don't understand the rationale for the code being the
> way it is now.
> Sadly, #2 is the only one I'm sure I know how to
> implement.

So do #2 and see what happens...

> So any suggestions on the best way to go about solving
> this problem?
>
> I've also noticed one other minor anomaly. CPUs get
> device nodes
> named ACPI0007:00, ACPI0007:01, ACPI0007:02, etc. If I
> remove a CPU
> from the system, then add another cpu to the system, the
> removed
> device node goes away, but its number doesn't get
> reused. So if I
> remove and then re-add the same CPU over and over, I end
> up getting
> nodes named ACPI0007:03, then that gets deleted, then I
> get
> ACPI0007:04, and so on. Is this supposed to happen?
> What happens if
> I remove and add the same CPU more than 99 times?

No idea, I'm surprised it works at all...
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/