Re: [PATCH v19 058/130] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page

From: Isaku Yamahata
Date: Thu Mar 14 2024 - 14:10:17 EST


On Wed, Mar 13, 2024 at 08:51:53PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> wrote:

> On Mon, 2024-02-26 at 00:26 -0800, isaku.yamahata@xxxxxxxxx wrote:
> > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> >
> > For private GPA, CPU refers a private page table whose contents are
> > encrypted.  The dedicated APIs to operate on it (e.g.
> > updating/reading its
> > PTE entry) are used and their cost is expensive.
> >
> > When KVM resolves KVM page fault, it walks the page tables.  To reuse
> > the
> > existing KVM MMU code and mitigate the heavy cost to directly walk
> > private
> > page table, allocate one more page to copy the dummy page table for
> > KVM MMU
> > code to directly walk.  Resolve KVM page fault with the existing
> > code, and
> > do additional operations necessary for the private page table. 
>
> > To
> > distinguish such cases, the existing KVM page table is called a
> > shared page
> > table (i.e. not associated with private page table), and the page
> > table
> > with private page table is called a private page table.
>
> This makes it sound like the dummy page table for the private alias is
> also called a shared page table, but in the drawing below it looks like
> only the shared alias is called "shared PT".

How about this,
Call the existing KVM page table associated with shared GPA as shared page table.
Call the KVM page table associate with private GPA as private page table.

> >   The relationship
> > is depicted below.
> >
> > Add a private pointer to struct kvm_mmu_page for private page table
> > and
> > add helper functions to allocate/initialize/free a private page table
> > page.
> >
> >               KVM page fault                     |
> >                      |                           |
> >                      V                           |
> >         -------------+----------                 |
> >         |                      |                 |
> >         V                      V                 |
> >      shared GPA           private GPA            |
> >         |                      |                 |
> >         V                      V                 |
> >     shared PT root      dummy PT root            |    private PT root
> >         |                      |                 |           |
> >         V                      V                 |           V
> >      shared PT            dummy PT ----propagate---->   private PT
> >         |                      |                 |           |
> >         |                      \-----------------+------\    |
> >         |                                        |      |    |
> >         V                                        |      V    V
> >   shared guest page                              |    private guest
> > page
> >                                                  |
> >                            non-encrypted memory  |    encrypted
> > memory
> >                                                  |
> > PT: page table
> > - Shared PT is visible to KVM and it is used by CPU.
> > - Private PT is used by CPU but it is invisible to KVM.
> > - Dummy PT is visible to KVM but not used by CPU.  It is used to
> >   propagate PT change to the actual private PT which is used by CPU.
> >
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > Reviewed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> > ---

..snip...

> > diff --git a/arch/x86/kvm/mmu/mmu_internal.h
> > b/arch/x86/kvm/mmu/mmu_internal.h
> > index e3f54701f98d..002f3f80bf3b 100644
> > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > @@ -101,7 +101,21 @@ struct kvm_mmu_page {
> >                 int root_count;
> >                 refcount_t tdp_mmu_root_count;
> >         };
> > -       unsigned int unsync_children;
> > +       union {
> > +               struct {
> > +                       unsigned int unsync_children;
> > +                       /*
> > +                        * Number of writes since the last time
> > traversal
> > +                        * visited this page.
> > +                        */
> > +                       atomic_t write_flooding_count;
> > +               };
>
> I think the point of putting these in a union is that they only apply
> to shadow paging and so can't be used with TDX. I think you are putting
> more than the sizeof(void *) in there as there are multiple in the same
> category.

I'm not sure if I'm following you.
On x86_64, sizeof(unsigned int) = 4, sizeof(atomic_t) = 4, sizeof(void *) = 8.
I moved write_flooding_count to have 8 bytes.


> But there seems to be a new one added, *shadowed_translation.
> Should it go in there too? Is the union because there wasn't room
> before, or just to be tidy?

Originally TDX MMU support was implemented for legacy tdp mmu. It used
shadowed_translation. It was not an option at that time. Later we switched to
(new) TDP MMU. Now we have choice to which member to overlay.


> I think the commit log should have more discussion of this union and
> maybe a comment in the struct to explain the purpose of the
> organization. Can you explain the reasoning now for the sake of
> discussion?

Sure. We'd like to add void * pointer to struct kvm_mmu_page. Given some
members are used only for legacy KVM MMUs and not used for TDP MMU, we can save
memory overhead with union. We have options.
- u64 *shadowed_translation
This was not chosen for the old implementation. Now this is option.
- pack unsync_children and write_flooding_count for 8 bytes
This patch chosen this for historical reason. Other two option is possible.
- unsync_child_bitmap
Historically it was unioned with other members. But now it's not.

I don't have strong preference for TDX support as long as we can have void *.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>