Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles

From: Ashok Raj
Date: Wed Jun 07 2023 - 12:23:02 EST


On Tue, Jun 06, 2023 at 12:41:43AM +0200, Thomas Gleixner wrote:
> On Mon, Jun 05 2023 at 10:41, Sean Christopherson wrote:
> > On Sat, Jun 03, 2023, Thomas Gleixner wrote:
> >> This is only half safe because HLT can resume execution due to NMI, SMI and
> >> MCE. Unfortunately there is no real safe mechanism to "park" a CPU reliably,
> >
> > On Intel. On AMD, enabling EFER.SVME and doing CLGI will block everything except
> > single-step #DB (lol) and RESET. #MC handling is implementation-dependent and
> > *might* cause shutdown, but at least there's a chance it will work. And presumably
> > modern CPUs do pend the #MC until GIF=1.
>
> Abusing SVME for that is definitely in the realm of creative bonus
> points, but not necessarily a general purpose solution.
>
> >> So parking them via INIT is not completely solving the problem, but it
> >> takes at least NMI and SMI out of the picture.
> >
> > Don't most SMM handlers rendezvous all CPUs? I.e. won't blocking SMIs indefinitely
> > potentially cause problems too?
>
> Not that I'm aware of. If so then this would be a hideous firmware bug
> as firmware must be aware of CPUs which hang around in INIT independent
> of this.

SMM does do the rendezvous of all CPUs, but also has a way to detect the
blocked ones, in WFS via some package scoped ubox register. So it knows to
skip those. I can find this in internal sources, but they aren't available
in the edk2 open reference code. They happen to be documented only in the
BWG, which isn't available freely.

I believe its behind the GetSmmDelayedBlockedDisabledCount()->
SmmCpuFeaturesGetSmmRegister()