Re: [PATCH RFC] x86/cpu: fix intermittent lockup on poweroff

From: Tom Lendacky
Date: Wed Apr 26 2023 - 15:18:54 EST




On 4/26/23 13:15, Dave Hansen wrote:
On 4/26/23 10:51, Tom Lendacky wrote:
+    /*
+     * native_stop_other_cpus() will write to @stop_cpus_count after
+     * observing that it went down to zero, which will invalidate the
+     * cacheline on this CPU.
+     */
+    atomic_dec(&stop_cpus_count);

This is probably going to pull in a cache line and cause the problem the
native_wbinvd() is trying to avoid.

Is one _more_ cacheline really the problem?

The answer is it depends. If the cacheline ends up modified/dirty, then it can be a problem.


Or is having _any_ cacheline pulled in a problem? What about the text
page containing the WBINVD? How about all the page table pages that are
needed to resolve %RIP to a physical address?

It's been a while since I looked into all this, but text and page table pages didn't present any problems because they weren't modified, but stack memory was. Doing a plain wbinvd() resulted in calls to the paravirt support and stack data from the call to wbinvd() ended up in some page structs in the kexec kernel (applicable to zen1 and zen2). Using native_wbinvd() eliminated the stack data changes after the WBINVD and didn't end up with any corruption following a kexec.


What about the mds_idle_clear_cpu_buffers() code that snuck into
native_halt()?

Luckily that is all inline and using a static branch which isn't enabled for AMD and should just jmp to the hlt, so no modified cache lines.

Thanks,
Tom


ffffffff810ede4c: 0f 09 wbinvd
ffffffff810ede4e: 8b 05 e4 3b a7 02 mov 0x2a73be4(%rip),%eax # ffffffff83b61a38 <mds_idle_clear>
ffffffff810ede54: 85 c0 test %eax,%eax
ffffffff810ede56: 7e 07 jle ffffffff810ede5f <stop_this_cpu+0x9f>
ffffffff810ede58: 0f 00 2d b1 75 13 01 verw 0x11375b1(%rip) # ffffffff82225410 <ds.6688>
ffffffff810ede5f: f4 hlt
ffffffff810ede60: eb ec jmp ffffffff810ede4e <stop_this_cpu+0x8e>
ffffffff810ede62: e8 59 40 1a 00 callq ffffffff81291ec0 <trace_hardirqs_off>
ffffffff810ede67: eb 85 jmp ffffffff810eddee <stop_this_cpu+0x2e>
ffffffff810ede69: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)