Re: 5.10.189 and 5.10.190 breaks nested virtualization

From: Sean Christopherson
Date: Wed Aug 16 2023 - 10:17:41 EST


On Wed, Aug 16, 2023, Dr. David Alan Gilbert wrote:
> * Blair Strater (strater@xxxxxxxxxxxxxx) wrote:
> > Per the request at https://lwn.net/Articles/940798/, I'm reaching out to
> > let you know that this patch breaks nested virtualization on AMD
> > processors.
> >
> > I've tested 5.10.189 and 5.10.190 on the "outer" virtual host, and both
> > Debian 12 running libvirt, and Proxmox 7 as "inner" hosts. For Debian, the
> > nested VM fails to start at all, and consumes the entirety of one CPU core.
> > For Proxmox, 100-200MB/second of memory is allocated and never released,
> > and also the guest fails to start. The problems go away when taking the
> > outer host back to 5.10.188.
> >
> > The processor in question is a Ryzen 7 2700. The kernel revision for
> > Proxmox is 5.15.108-1-pve, and the kernel revision for Debian 12 is
> > 6.1.0-11-amd64. I've run into another person who can confirm that this bug
> > also occurs in the 6.4 series, "somewhere between 6.4.3 and 6.4.9", 6.4.9
> > being the likely culprit.
> >
> > Please let me know if you need any other information.
> >
> > I apologize for bothering an entire mailing list, Greg's email bot told me to.
>
> cc'd in Vitaly (who I notice was working another regression bug in
> the recent stables), and Sean (who I notice has an L2 patch in the
> 5.10.189..190 set).

Does running with "spec_rstack_overflow=off" fix things for you? If so, can you
then try testing the fix for the guest RFLAGS corruption[1]? It's a bit of a long
shot, but I'm hoping we'll get lucky and all of these nested SVM errors[2] are
just weird symptoms of branches going awry.

[1] https://lore.kernel.org/all/20230811155255.250835-1-seanjc@xxxxxxxxxx
[2] https://bugzilla.kernel.org/show_bug.cgi?id=217796