Re: [PATCH v2 8/9] KVM: X86: Optimize pte_list_desc with per-array counter

From: Peter Xu
Date: Fri Jul 30 2021 - 11:46:03 EST


On Wed, Jul 28, 2021 at 09:04:30PM +0000, Sean Christopherson wrote:
> > struct pte_list_desc {
> > u64 *sptes[PTE_LIST_EXT];
> > + /*
> > + * Stores number of entries stored in the pte_list_desc. No need to be
> > + * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
> > + */
> > + u64 spte_count;
>
> Per my feedback to the previous patch, this should be above sptes[] so that rmaps
> with <8 SPTEs only touch one cache line. No idea if it actually matters in
> practice, but I can't see how it would harm anything.

Since at it, I'll further move "more" to be at the entry too, so I think it
optimizes full entries case too.

/*
* Slight optimization of cacheline layout, by putting `more' and `spte_count'
* at the start; then accessing it will only use one single cacheline for
* either full (entries==PTE_LIST_EXT) case or entries<=6.
*/
struct pte_list_desc {
struct pte_list_desc *more;
/*
* Stores number of entries stored in the pte_list_desc. No need to be
* u64 but just for easier alignment. When PTE_LIST_EXT, means full.
*/
u64 spte_count;
u64 *sptes[PTE_LIST_EXT];
};

Thanks,

--
Peter Xu