Re: [patch V2 1/8] x86/smp: Make stop_other_cpus() more robust

From: Thomas Gleixner
Date: Wed Jun 14 2023 - 18:40:20 EST


On Wed, Jun 14 2023 at 13:47, Ashok Raj wrote:
> On Wed, Jun 14, 2023 at 09:53:21PM +0200, Thomas Gleixner wrote:
>>
>> Now let me look into this NMI cruft.
>>
>
> Maybe if each CPU going down can set their mask, we can simply hit NMI to
> only the problematic ones?
>
> The simple count doesn't capture the CPUs in trouble.

Even a mask is not cutting it. If CPUs did not react on the reboot
vector then there is no guarantee that they are not going to do so
concurrently to the NMI IPI:

CPU0 CPU1

IPI(BROADCAST, REBOOT);
wait() // timeout
stop_this_cpu()
if (!all_stopped()) {
for_each_cpu(cpu, mask) {
mark_stopped(); <- all_stopped() == true now
IPI(cpu, NMI);
} --> NMI()

// no wait() because all_stopped() == true

proceed_and_hope() ....

On bare metal this is likely to "work" by chance, but in a guest all
bets are off.

I'm not surprised at all.

The approach of piling hardware and firmware legacy on top of hardware
and firmware legacy in the hope that we can "fix" that in software was
wrong from the very beginning.

What's surprising is that this worked for a really long time. Though
with increasing complexity the thereby produced debris is starting to
rear its ugly head.

I'm sure the marketing departements of _all_ x86 vendors will come up
with a brilliant slogan for that. Something like:

"We are committed to ensure that you are able to experience the
failures of the past forever with increasingly improved performance
and new exciting features which are fully backwards failure
compatible."

TBH, the (OS) software industry has proliferated that by joining the
'features first' choir without much thought and push back. See
arch/x86/kernel/cpu/* for prime examples.

Ranted enough. I'm going to sleep now and look at this mess tomorrow
morning with brain awake again. Though that will not change the
underlying problem, which is unfixable.

Thanks,

tglx