Re: [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum

From: Huang, Kai
Date: Tue Jun 20 2023 - 06:31:27 EST


On Mon, 2023-06-19 at 14:21 +0200, David Hildenbrand wrote:
> On 04.06.23 16:27, Kai Huang wrote:
> > TDX memory has integrity and confidentiality protections. Violations of
> > this integrity protection are supposed to only affect TDX operations and
> > are never supposed to affect the host kernel itself. In other words,
> > the host kernel should never, itself, see machine checks induced by the
> > TDX integrity hardware.
> >
> > Alas, the first few generations of TDX hardware have an erratum. A
> > "partial" write to a TDX private memory cacheline will silently "poison"
> > the line. Subsequent reads will consume the poison and generate a
> > machine check. According to the TDX hardware spec, neither of these
> > things should have happened.
> >
> > Virtually all kernel memory accesses operations happen in full
> > cachelines. In practice, writing a "byte" of memory usually reads a 64
> > byte cacheline of memory, modifies it, then writes the whole line back.
> > Those operations do not trigger this problem.
>
> So, ordinary writes to TD private memory are not a problem? 
>

Not a problem for the kernel as such write won't poison the memory directly, so
if the kernel reads those memory there won't be #MC.

However if TDX guest reads those memory (which was previous written by kernel or
userspace), the memory is marked as poison when read and #MC is triggered.

> I thought
> one motivation for the unmapped-guest-memory discussion was to prevent
> host (userspace) writes to such memory because it would trigger a MC and
> eventually crash the host.

Yeah the #MC will be triggered inside the TDX guest. I think in most cases such
#MC won't cause host kernel crash but only the victim TDX guest is killed. But
there might be some cases we may not be able to handle #MC gracefully, e.g., in
some particular BIOS setting. One example is with LMCE disabled, any #MC would
be broadcast to all LPs causing all other TDX guests running on other LPs being
killed.

Also quoted from Chao, Peng, who has been working on the unmapped-guest-memory
since early time:

"
The problem is we may not always be able to handle #MC gracefully, in
some configurations (BIOS settings) the #MC can cause the whole system
reset, not just kill the TD. At least this is the original motivation
for Intel to start this series. I think the case is still true unless I
missed something. From KVM community, they have motivation to unmap the
private memory from userspace even the #MC is not fatal, just to prevent
possible unintended accesses from userspace (that's why they ask AMD to
use this series even their machine doesn't cause system reset when the
same happens).
"

>
> I recall that this would happen easily (not just in some weird "partial"
> case and that the spec would allow for it)

No as mentioned above, this partial write #MC is different from the one
triggered in TDX guest as mentioned above.

>
> 1) Does that, in general, not happen anymore (was the hardware fixed?)?
>
> 2) Will new hardware prevent/"fix" that completely (was the spec updated?)?

Yes this erratum will be fixed in later generations of TDX hardware. It only
appears on SPR and EMR (the first two generations of TDX hardware).