Re: [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace

From: Sean Christopherson
Date: Thu Jan 11 2024 - 11:25:12 EST


On Thu, Jan 11, 2024, Tao Su wrote:
> On Wed, Jan 10, 2024 at 08:26:25AM -0800, Sean Christopherson wrote:
> > On Wed, Jan 10, 2024, Chao Gao wrote:
> > > On Tue, Jan 09, 2024 at 04:23:40PM -0800, Sean Christopherson wrote:
> > > >Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query
> > > >whether or not the CPU supports 5-level EPT paging. EPT capabilities are
> > > >enumerated via MSR, i.e. aren't accessible to userspace without help from
> > > >the kernel, and knowing whether or not 5-level EPT is supported is sadly
> > > >necessary for userspace to correctly configure KVM VMs.
> > >
> > > This assumes procfs is enabled in Kconfig and userspace has permission to
> > > access /proc/cpuinfo. But it isn't always true. So, I think it is better to
> > > advertise max addressable GPA via KVM ioctls.
> >
> > Hrm, so the help for PROC_FS says:
> >
> > Several programs depend on this, so everyone should say Y here.
> >
> > Given that this is working around something that is borderline an erratum, I'm
> > inclined to say that userspace shouldn't simply assume the worst if /proc isn't
> > available. Practically speaking, I don't think a "real" VM is likely to be
> > affected; AFAIK, there's no reason for QEMU or any other VMM to _need_ to expose
> > a memslot at GPA[51:48] unless the VM really has however much memory that is
> > (hundreds of terabytes?). And a if someone is trying to run such a massive VM on
> > such a goofy CPU...
>
> It is unusual to assign a huge RAM to guest, but passthrough a device also may trigger
> this issue which we have met, i.e. alloc memslot for the 64bit BAR which can set
> bits[51:48]. BIOS can control the BAR address, e.g. seabios moved 64bit pci window
> to end of address space by using advertised physical bits[1].

Drat. Do you know if these CPUs are going to be productized? We'll still need
something in KVM either way, but whether or not the problems are more or less
limited to funky software setups might influence how we address this.