Re: [PATCH 0/2 v3] cpu hotplug: Preserve topology directory after soft remove event

From: Prarit Bhargava
Date: Thu Sep 22 2016 - 07:59:37 EST




On 09/21/2016 10:01 AM, Borislav Petkov wrote:
> On Wed, Sep 21, 2016 at 09:32:47AM -0400, Prarit Bhargava wrote:
>> This is not the right thing to do [1]. The topology directory should exist as
>> long as the thread is present in the system. The thread (and its core) are
>> still physically there, it's just that the thread is not available to the
>> scheduler. The topology of the thread hasn't changed due to it being soft
>> offlined this way.
>
> So far so good.
>
>> turbostat was modified to deal with the missing topology directory, and in tree
>> utility cpupower prints out significantly less information when a thread is
>> offline.
>
> Why does it do that? Why does an offlined core change that info?
>
> Concrete details please.
>
>> ISTR a powertop bug due to hotplug too. This makes these monitoring
>> utilities a problem for users who want only one thread per core.
>
> one thread per core? What does that mean?

System boots with (usually) with 2 threads/core. Some performance users want
one thread per core. Since there is no "noht" option anymore, users use /sys to
disable a thread on each core.

>
>> This now means that
>>
>> echo 0 > /sys/devices/system/cpu/cpu29/online
>>
>> will result in the thread's topology directory staying around until the struct
>> device associated with it is destroyed upon a physical socket hotplug event.
>
> So your 2/2 says that on an offlined CPU, you have
>
> /sys/devices/system/cpu/cpu10/topology/core_id:3
> /sys/devices/system/cpu/cpu10/topology/core_siblings:0000
> /sys/devices/system/cpu/cpu10/topology/core_siblings_list:
> /sys/devices/system/cpu/cpu10/topology/physical_package_id:0
> /sys/devices/system/cpu/cpu10/topology/thread_siblings:0000
> /sys/devices/system/cpu/cpu10/topology/thread_siblings_list:
>
> and this information is bollocks. core_siblings is 0, thread_siblings
> is 0. You can just as well not have them there at all.

core_siblings and thread_siblings are the online thread's sibling cores and
threads that are available to the scheduler, and should be 0 when the thread is
offline. That comes directly from reading the code.

>
> So is this whole jumping around just so that you can have a
> /sys/devices/system/cpu/cpu10/topology directory and so that tools don't
> get confused by it missing?

Yes.

>
> So again, what exactly are those tools accessing and how does the
> offlined cores puzzle them?
>
> A concrete example please:
>

See commit 20102ac5bee3 ("cpupower: cpupower monitor reports uninitialized
values for offline cpus"). That patch papers over the bug of not being able to
find core_id and physical_package_id for an offline thread.

P.