Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles

From: Sean Christopherson
Date: Mon Jun 05 2023 - 19:09:05 EST

Next message: Sagi Grimberg: "Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting"
Previous message: Dave Hansen: "Re: [PATCH v11 07/20] x86/virt/tdx: Add skeleton to enable TDX on demand"
In reply to: Thomas Gleixner: "Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles"
Next in thread: Thomas Gleixner: "Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jun 06, 2023, Thomas Gleixner wrote:
> On Mon, Jun 05 2023 at 10:41, Sean Christopherson wrote:
> > On Sat, Jun 03, 2023, Thomas Gleixner wrote:
> >> This is only half safe because HLT can resume execution due to NMI, SMI and
> >> MCE. Unfortunately there is no real safe mechanism to "park" a CPU reliably,
> >
> > On Intel. On AMD, enabling EFER.SVME and doing CLGI will block everything except
> > single-step #DB (lol) and RESET. #MC handling is implementation-dependent and
> > *might* cause shutdown, but at least there's a chance it will work. And presumably
> > modern CPUs do pend the #MC until GIF=1.
>
> Abusing SVME for that is definitely in the realm of creative bonus
> points, but not necessarily a general purpose solution.

Heh, my follow-up ideas for Intel are to abuse XuCode or SEAM ;-)

> >> So parking them via INIT is not completely solving the problem, but it
> >> takes at least NMI and SMI out of the picture.
> >
> > Don't most SMM handlers rendezvous all CPUs? I.e. won't blocking SMIs indefinitely
> > potentially cause problems too?
>
> Not that I'm aware of. If so then this would be a hideous firmware bug
> as firmware must be aware of CPUs which hang around in INIT independent
> of this.

I was thinking of the EDKII code in UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c, e.g.
SmmWaitForApArrival(). I've never dug deeply into how EDKII uses SMM, what its
timeouts are, etc., I just remember coming across that code when poking around
EDKII for other stuff.

> > Why not carve out a page that's hidden across kexec() to hold whatever code+data
> > is needed to safely execute a HLT loop indefinitely?
>
> See below.
>
> > E.g. doesn't the original kernel provide the e820 tables for the
> > post-kexec() kernel?
>
> Only for crash kernels if I'm not missing something.

Ah, drat.

> Making this work for regular kexec() including this:
>
> > To avoid OOM after many kexec(), reserving a page could be done iff
> > the current kernel wasn't itself kexec()'d.
>
> would be possible and I thought about it, but that needs a complete new
> design of "offline", "shutdown offline" and a non-trivial amount of
> backwards compatibility magic because you can't assume that the kexec()
> kernel version is greater or equal to the current one. kexec() is
> supposed to work both ways, downgrading and upgrading. IOW, that ship
> sailed long ago.

Right, but doesn't gaining "full" protection require ruling out unenlightened
downgrades? E.g. if someone downgrades to an old kernel, doesn't hide the "offline"
CPUs from the kexec() kernel, and boots the old kernel with -nosmt or whatever,
then that old kernel will do the naive MWAIT or unprotected HLT and it's hosed again.

If we're relying on the admin to hide the offline CPUs, could we usurp an existing
kernel param to hide a small chunk of memory instead?

Next message: Sagi Grimberg: "Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting"
Previous message: Dave Hansen: "Re: [PATCH v11 07/20] x86/virt/tdx: Add skeleton to enable TDX on demand"
In reply to: Thomas Gleixner: "Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles"
Next in thread: Thomas Gleixner: "Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]