RE: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad() issue

From: Michael Kelley (LINUX)
Date: Tue Jul 25 2023 - 23:15:35 EST


From: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Sent: Tuesday, July 25, 2023 4:13 PM
>
> On Tue, Jul 25, 2023 at 03:51:24PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Sent: Monday, July 24,
> 2023 4:19 PM
> > >
> > > On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > > > From: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> Sent: Saturday, July 8, 2023
> 11:09 PM
> > > > >
> > > > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > From: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> Sent: Friday, July 7, 2023
> 7:07 AM
> > > > > > >
> > > > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > > > From: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Sent: Tuesday,
> June 6, 2023 2:56 AM
> > > > > >
> > > > > > [snip]
> > > > > >
> > > > > > >
> > > > > > > It only addresses the problem that happens on transition, but
> > > > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > > > kernel should be able to handle it.
> > > > > >
> > > > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > > > situation where shared mappings in general can trigger a #VE? How
> > > > > > do such situations get handled for references that aren't from
> > > > > > load_unaligned_zeropad()?
> > > > > >
> > > > >
> > > > > Shared mappings are under host/VMM control. It can just not map the page
> > > > > in shared-ept and trigger ept-violation #VE.
> > > >
> > > > I know you are out on vacation, but let me follow up now for further
> > > > discussion when you are back.
> > > >
> > > > Isn't the scenario you are describing a malfunctioning or malicious
> > > > host/VMM? Would what you are describing be done as part of normal
> > > > operation? Kernel code must have switched the page from private to
> > > > shared for some purpose. As soon as that code (which presumably
> > > > does not have any entry in the exception table) touches the page, it
> > > > would take the #VE and the enter the die path because there's no fixup.
> > > > So is there value in having load_unaligned_zeropad() handle the #VE and
> > > > succeed where a normal reference would fail?
> > >
> > > #VE on shared memory is legitimately used for MMIO. But MMIO region is
> > > usually separate from the real memory in physical address space.
> > >
> > > But we also have DMA.
> > >
> > > DMA pages allocated from common pool of memory and they can be next to
> > > dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> > > are shared, but they usually backed by memory and not cause #VE. However
> > > shared memory is under full control from VMM and VMM can remove page at
> > > any point which would make platform to deliver #VE to the guest. This is
> > > pathological scenario, but I think it still worth handling gracefully.
> >
> > Yep, pages targeted by DMA have been transitioned to shared, and could
> > be scattered anywhere in the guest physical address space. Such a page
> > could be touched by load_unaligned_zeropad(). But could you elaborate
> > more on the "pathological scenario" where the guest physical page isn't
> > backed by memory?
> >
> > Sure, the VMM can remove the page at any point, but why would it do
> > so? Is doing so a legitimate part of the host/guest contract that the guest
> > must handle cleanly? More importantly, what is the guest expected to
> > do in such a case? I would expect that at some point such a DMA page
> > is accessed by a guest vCPU with an explicit reference that is not
> > load_unaligned_zeropad(). Then the guest would take a #VE that
> > goes down the #GP path and invokes die().
> >
> > I don't object to make the load_unaligned_zeropad() fixup path work
> > correctly in response to a #VE, as it seems like general goodness. I'm
> > just trying to make sure I understand the nuances of "why". :-)
>
> We actually triggered the issue during internal testing. I wrote initial
> patch after that. But looking back on the report I don't have an answer
> why the page triggered #VE. Maybe VMM or virtual BIOS screwed up and put
> MMIO next to normal memory, I donno.
>
> > > > I'd still like to see the private <-> shared transition code mark the pages
> > > > as invalid during the transition, and avoid the possibility of #VE and
> > > > similar cases with SEV-SNP. Such approach reduces (eliminates?)
> > > > entanglement between CoCo-specific exceptions and
> > > > load_unaligned_zeropad(). It also greatly simplifies TD Partition cases
> > > > and SEV-SNP cases where a paravisor is used.
> > >
> > > I doesn't eliminates issue for TDX as the scenario above is not transient.
> > > It can happen after memory is converted to shared.
> >
> > Notwithstanding, do you have any objections to the private <-> shared
> > transition code being changed so it won't be the cause of #VE, and
> > similar on SEV-SNP?
>
> I am not yet convinced it is needed. But let's see the code.
>

Fair enough. But similarly, I'm not convinced that we have
load_unaligned_zeropad() cases outside of private <-> shared transitions
that *must* be fixed up cleanly. :-)

Here's the RFC patch, and it includes a couple of open questions:

https://lore.kernel.org/lkml/1688661719-60329-1-git-send-email-mikelley@xxxxxxxxxxxxx/

In TD Partitioning configurations, I'm hoping we can eliminate #VE EPT
violations needing to be handled in the L2 guest. The L1 guest will handle
things like MMIO. But the L1 guest can't easily detect that a #VE with EPT
violation is due load_unaligned_zeropad() in the L2. And trying to reflect
the #VE from the L1 guest to the L2 guest wades into a bunch of problems
that I want to avoid.

The same problems arise in SEV-SNP with a paravisor running at VMPL 0.
Changing the private <-> shared transition code prevents the problems
there as well.

Michael