Re: [PATCH] KVM:SVM: Flush cache only on CPUs running SEV guest

From: Sean Christopherson
Date: Wed Mar 06 2024 - 11:45:30 EST


On Wed, Mar 06, 2024, Tom Lendacky wrote:
> On 3/5/24 18:19, Sean Christopherson wrote:
> > On Tue, Mar 05, 2024, Tom Lendacky wrote:
> > > On 3/4/24 11:55, Sean Christopherson wrote:
> > > > +Tom
> > > >
> > > > "KVM: SVM:" for the shortlog scope.
> > > >
> > > > On Fri, Mar 01, 2024, Zheyun Shen wrote:
> > > > > On AMD CPUs without ensuring cache consistency, each memory page reclamation in
> > > > > an SEV guest triggers a call to wbinvd_on_all_cpus, thereby affecting the
> > > > > performance of other programs on the host.
> > > > >
> > > > > Typically, an AMD server may have 128 cores or more, while the SEV guest might only
> > > > > utilize 8 of these cores. Meanwhile, host can use qemu-affinity to bind these 8 vCPUs
> > > > > to specific physical CPUs.
> > > > >
> > > > > Therefore, keeping a record of the physical core numbers each time a vCPU runs
> > > > > can help avoid flushing the cache for all CPUs every time.
> > > >
> > > > This needs an unequivocal statement from AMD that flushing caches only on CPUs
> > > > that do VMRUN is sufficient. That sounds like it should be obviously correct,
> > > > as I don't see how else a cache line can be dirtied for the encrypted PA, but
> > > > this entire non-coherent caches mess makes me more than a bit paranoid.
> > >
> > > As long as the wbinvd_on_all_cpus() related to the ASID flushing isn't
> > > changed, this should be ok. And the code currently flushes the source pages
> > > when doing LAUNCH_UPDATE commands and adding encrypted regions, so should be
> > > good there.
> >
> > Nice, thanks!
> >
> > > Would it make sense to make this configurable, with the current behavior the
> > > default, until testing looks good for a while?
> >
> > I don't hate the idea, but I'm inclined to hit the "I'm feeling lucky" button.
> > I would rather we put in effort to all but guarantee we can do a clean revert in
> > the future, at which point a kill switch doesn't add all that much value. E.g.
> > it would allow for a non-disruptive fix, and maybe a slightly faster confirmation
> > of a bug, but that's about it.
> >
> > And since the fallout from this would be host data corruption, _not_ rebooting
> > hosts that may have been corrupted is probably a bad idea, i.e. the whole
> > non-disruptive fix benefit is quite dubious.
> >
> > The other issue is that it'd be extremely difficult to know when we could/should
> > remove the kill switch. It might be months or even years before anyone starts
> > running high volume of SEV/SEV-ES VMs with this optimization.
>
> I can run the next version of the patch through our CI and see if it
> uncovers anything. I just worry about corner cases... but then that's just
> me.

Heh, it's definitely not just you, this scares me more than a bit.

Doh, I realize I misread your suggestion (several times). You're suggesting we
make this opt-in. Hmm, that's definitely more valuable than a kill switch, though
it has the same problem of us having no idea when it's safe to enable by default.
And I'm not sure I like the idea of having a knob that basically says, "we think
this works, but we're too scared to enable it by default, so _you_ should totally
enable it and let us know if we've corrupted your data" :-)