Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask

From: Huang, Kai
Date: Thu Jun 22 2023 - 18:31:25 EST


On Thu, 2023-06-22 at 10:32 -0500, Michael Roth wrote:
> On Thu, Jun 22, 2023 at 09:55:22AM +0000, Huang, Kai wrote:
> >
> > >
> > > So if we were to straight-forwardly implement that based on how TDX
> > > currently handles checking for the shared bit in GPA, paired with how
> > > SEV-SNP handles checking for private bit in fault flags, it would look
> > > something like:
> > >
> > > bool kvm_fault_is_private(kvm, gpa, err)
> > > {
> > > /* SEV-SNP handling */
> > > if (kvm->arch.mmu_private_fault_mask)
> > > return !!(err & arch.mmu_private_fault_mask);
> > >
> > > /* TDX handling */
> > > if (kvm->arch.gfn_shared_mask)
> > > return !!(gpa & arch.gfn_shared_mask);
> >
> > The logic of the two are identical. I think they need to be converged.
>
> I think they're just different enough that trying too hard to converge
> them might obfuscate things. If the determination didn't come from 2
> completely different fields (gpa vs. fault flags) maybe it could be
> simplified a bit more, but have well-defined open-coded handler that
> gets called once to set fault->is_private during initial fault time so
> that that ugliness never needs to be looked at again by KVM MMU seems
> like a good way to keep things simple through the rest of the handling.

I actually agree with the second half that is after "but ...", but IIUC it
doesn't conflict with converging arch.mmu_private_fault_mask and
arch.gfn_shared_mask into one. No?

>
> >
> > Either SEV-SNP should convert the error code private bit to the gfn_shared_mask,
> > or TDX's shared bit should be converted to some private error bit.
>
> struct kvm_page_fault seems to be the preferred way to pass additional
> params/metadata around, and .is_private field was introduced to track
> this private/shared state as part of UPM base series:
>
> https://lore.kernel.org/lkml/20221202061347.1070246-9-chao.p.peng@xxxxxxxxxxxxxxx/

Sure. Agreed.

>
> So it seems like unecessary complexity to track/encode that state into
> other additional places rather than just encapsulating it all in
> fault->is_private (or some similar field), and synthesizing all this
> platform-specific handling into a well-defined value that's conveyed
> by something like fault->is_private in a way where KVM MMU doesn't need
> to worry as much about platform-specific stuff seems like a good thing,
> and in line with what the UPM base series was trying to do by adding the
> fault->is_private field.

Yes we should just set fault->is_private and pass to the rest of the common KVM
MMU code.

>
> So all I'm really proposing is that whatever SNP and TDX end up doing
> should center around setting that fault->is_private field and keeping
> everything contained there. 
>

I agree.

> If there are better ways to handle *how*
> that's done I don't have any complaints there, but moving/adding bits
> to GPA/error_flags after fault time just seems unecessary to me when
> fault->is_private field can serve that purpose just as well.

Perhaps you missed my point. My point is arch.mmu_private_fault_mask and
arch.gfn_shared_mask seem redundant because the logic around them are exactly
the same. I do believe we should have fault->is_private passing to the common
MMU code.

In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
"gfn_shared_mask" in _common_ KVM MMU code. We already have enough mechanism in
KVM common code:

1) fault->is_private
2) kvm_mmu_page_role.private
3) an Xarray to tell whether a GFN is private or shared

I am not convinced that we need to have "mmu_private_fault_mask" and
"gfn_shared_mask" in common KVM MMU code. Instead, they can be in AMD and
Intel's vendor code.

Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
the fault handler can just strip away the "shared bit" at the very beginning (at
least for TDX), but for the rest of the time I think we should already have
enough infrastructure to handle private/shared mapping.

Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
applied to the GFN for shared mapping in normal EPT. I guess AMD doesn't need
that for shared mapping. So "gfn_shared_mask" maybe useful in this case, but
w/o it I believe we can also achieve in another way via vendor callback.

>
> >
> > Perhaps converting SEV-SNP makes more sense because if I recall correctly SEV
> > guest also has a C-bit, correct?
>
> That's correct, but the C-bit doesn't show in the GPA that gets passed
> up to KVM during an #NPF, and instead gets encoded into the fault's
> error_flags.
>
> >
> > Or, ...
> >
> > >
> > > return false;
> > > }
> > >
> > > kvm_mmu_do_page_fault(vcpu, gpa, err, ...)
> > > {
> > > struct kvm_page_fault fault = {
> > > ...
> > > .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err)
> >
> > ... should we do something like:
> >
> > .is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, gpa, 
> > err);
>
> We actually had exactly this in v7 of SNP hypervisor patches:
>
> https://lore.kernel.org/linux-coco/20221214194056.161492-7-michael.roth@xxxxxxx/T/#m17841f5bfdfb8350d69d78c6831dd8f3a4748638
>
> but Sean was hoping to avoid a callback, which is why we ended up using
> a bitmask in this version since it basically all that callback would
> need to do. It's unfortunately that we don't have a common scheme
> between SNP/TDX, but maybe that's still possible, I just think that
> whatever that ends up being, it should live and be contained inside
> whatever helper ends up setting fault->is_private.

Sure. If the function call can be avoided in fault handler then we should.

>
> There's some other awkwardness with a callback approach. It sort of ties
> into your question about selftests so I'll address it below...
>
>
> >
> > ?
> >
> > > };
> > >
> > > ...
> > > }
> > >
> > > And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be
> > > set per-KVM-instance, just like they are now with current SNP and TDX
> > > patchsets, since stuff like KVM self-test wouldn't be setting those
> > > masks, so it makes sense to do it per-instance in that regard.
> > >
> > > But that still gets a little awkward for the KVM self-test use-case where
> > > .is_private should sort of be ignored in favor of whatever the xarray
> > > reports via kvm_mem_is_private(). 
> > >
> >
> > I must have missed something. Why does KVM self-test have impact to how does
> > KVM handles private fault?
>
> The self-tests I'm referring to here are the ones from Vishal that shipped with
> v10 of Chao's UPM/fd-based private memory series, and also as part of Sean's
> gmem tree:
>
> https://github.com/sean-jc/linux/commit/a0f5f8c911804f55935094ad3a277301704330a6
>
> These exercise gmem/UPM handling without the need for any SNP/TDX
> hardware support. They do so by "trusting" the shared/private state
> that the VMM sets via KVM_SET_MEMORY_ATTRIBUTES. So if VMM says it
> should be private, KVM MMU will treat it as private, so we'd never
> get a mismatch, so KVM_EXIT_MEMORY_FAULT will never be generated.

If KVM supports a test mode, or by reading another thread, KVM wants to support
a software-based KVM_X86_PROTECTED_VM and distinguish it with hardware-based
confidential VMs such as TDX and SEV-SNP:

https://lore.kernel.org/lkml/9e3e99f78fcbd7db21368b5fe1d931feeb4db567.1686858861.git.isaku.yamahata@xxxxxxxxx/T/#me627bed3d9acf73ea882e8baa76dfcb27759c440

then it's fine to handle it when setting up fault->is_private. But my point is
KVM self-test itself should not impact how does KVM implement fault handler --
it is KVM itself wants to support additional software-based things.

[...]

(Thanks for explaining additional background).