Re: [PATCH 0/7] KVM: x86: guest MAXPHYADDR and C-bit fixes

From: Sean Christopherson
Date: Thu Jun 24 2021 - 19:48:04 EST


On Thu, Jun 24, 2021, Sean Christopherson wrote:
> On Thu, Jun 24, 2021, Tom Lendacky wrote:
> > On 6/24/21 12:39 PM, Tom Lendacky wrote:
> > >
> > >
> > > On 6/24/21 12:31 PM, Sean Christopherson wrote:
> > >> On Thu, Jun 24, 2021, Tom Lendacky wrote:
> > >>>>
> > >>>> Here's an explanation of the physical address reduction for bare-metal and
> > >>>> guest.
> > >>>>
> > >>>> With MSR 0xC001_0010[SMEE] = 0:
> > >>>> No reduction in host or guest max physical address.
> > >>>>
> > >>>> With MSR 0xC001_0010[SMEE] = 1:
> > >>>> - Reduction in the host is enumerated by CPUID 0x8000_001F_EBX[11:6],
> > >>>> regardless of whether SME is enabled in the host or not. So, for example
> > >>>> on EPYC generation 2 (Rome) you would see a reduction from 48 to 43.
> > >>>> - There is no reduction in physical address in a legacy guest (non-SEV
> > >>>> guest), so the guest can use a 48-bit physical address
> > >>
> > >> So the behavior I'm seeing is either a CPU bug or user error. Can you verify
> > >> the unexpected #PF behavior to make sure I'm not doing something stupid?
> > >
> > > Yeah, I saw that in patch #3. Let me see what I can find out. I could just
> > > be wrong on that myself - it wouldn't be the first time.
> >
> > From patch #3:
> > SVM: KVM: CPU #PF @ rip = 0x409ca4, cr2 = 0xc0000000, pfec = 0xb
> > KVM: guest PTE = 0x181023 @ GPA = 0x180000, level = 4
> > KVM: guest PTE = 0x186023 @ GPA = 0x181000, level = 3
> > KVM: guest PTE = 0x187023 @ GPA = 0x186000, level = 2
> > KVM: guest PTE = 0xffffbffff003 @ GPA = 0x187000, level = 1
> > SVM: KVM: GPA = 0x7fffbffff000
> >
> > I think you may be hitting a special HT region that is at the top 12GB of
> > the 48-bit memory range and is reserved, even for GPAs. Can you somehow
> > get the test to use an address below 0xfffd_0000_0000? That would show
> > that bit 47 is valid for the legacy guest while staying out of the HT region.
>
> I can make that happen.

Ah, hilarious. That indeed does the trick. 0xfffd00000000 = #PF,
0xfffcfffff000 = good.

I'll send a revert shortly. There's another C-bit bug that needs fixing, too :-/
The unconditional __sme_clr() in npf_interception() is wrong and breaks non-SEV
guests. Based on this from the APM

If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Consequently, the hypervisor does not need to be aware of which pages
the guest has chosen to mark private.

I assume it's not needed for SEV either? I'm about to find out shortly, but if
you happen to know for sure... :-)