Re: [PATCH RFC] x86/cpu: fix intermittent lockup on poweroff

From: Tony Battersby
Date: Wed Apr 26 2023 - 10:57:20 EST


On 4/25/23 17:05, Thomas Gleixner wrote:
> On Tue, Apr 25 2023 at 13:03, Dave Hansen wrote:
>
>> On 4/25/23 12:26, Tony Battersby wrote:
>>> - if (cpuid_eax(0x8000001f) & BIT(0))
>>> + if (c->extended_cpuid_level >= 0x8000001f &&
>>> + (cpuid_eax(0x8000001f) & BIT(0)))
>>> native_wbinvd();
>> Oh, so the existing code is running into the
>>
>>> If a value entered for CPUID.EAX is higher than the maximum input
>>> value for basic or extended function for that processor then the data
>>> for the highest basic information leaf is returned
>> behavior. It's basically looking at BIT(0) of some random extended
>> leaf. Probably 0x80000008 based on your 'cpuid -r' output.
> Right, accessing that leaf without checking whether it exists is wrong,
> but that does not explain the hang itself.
>
> The only consequence of looking at bit 0 of some random other leaf is
> that all CPUs which run stop_this_cpu() issue WBINVD in parallel, which
> is slow but should not be a fatal issue.
>
> Tony observed this is a 50% chance to hang, which means this is a timing
> issue.
>
> Now there are two things to investigate:
>
> 1) Does the system go south if enough CPUs issue WBINVD concurrently?
>
> That should be trivial to analyze by enforcing concurreny on a
> WBINVD in an IPI via a global synchronization bit or such
>
> 2) The first thing stop_this_cpu() does is to clear its own bit in the
> CPU online mask.
>
> The CPU which controls shutdown/reboot waits for num_online_cpus()
> to drop down to one, which means it can proceed _before_ the other
> CPUs have actually reached HALT.
>
> That's not a new thing. It has been that way forever. Just the
> WBINVD might cause enough delay to create problems.
>
> That should be trivial to analyze too by just waiting on the
> control side for e.g 100ms after num_online_cpus() dropped down to
> one.
>
For test #1, I have never used IPI before, so I would have to look into
how to do that.  Or you could send me a patch to test if you still want
the test done.  But test #2 produced results, so maybe it is not necessary.

For test #2, I re-enabled native_wbinvd() by reverting the patch that I
sent, and then I applied the following patch:

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 375b33ecafa2..1a9b225c85b6 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -212,6 +212,7 @@ static void native_stop_other_cpus(int wait)
udelay(1);
}

+ mdelay(100);
local_irq_save(flags);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

With that I got a successful power-off 10 times in a row.

Let me know if there is anything else I can test.

I will resend my patch later with a different description.

Tony Battersby
Cybernetics