Re: [PATCH v5 00/10] KVM: xen: update shared_info and vcpu_info handling

From: David Woodhouse
Date: Fri Sep 22 2023 - 11:37:28 EST


On Fri, 2023-09-22 at 14:59 +0000, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@xxxxxxxxxx>
>
> The following part of the original cover letter still applies...
>
> "Currently we treat the shared_info page as guest memory and the VMM
> informs KVM of its location using a GFN. However it is not guest memory as
> such; it's an overlay page. So we pointlessly invalidate and re-cache a
> mapping to the *same page* of memory every time the guest requests that
> shared_info be mapped into its address space. Let's avoid doing that by
> modifying the pfncache code to allow activation using a fixed userspace
> HVA as well as a GPA."
>
> However, this version of the series has dropped the other changes to try
> to handle the default vcpu_info location directly in KVM. With all the
> corner cases, it was getting sufficiently complex the functionality is
> better off staying in the VMM. So, instead of that code, two new patches
> have been added:

I think there's key information missing from this cover letter (and
since cover letters don't get preserved, it probably wants to end up in
one of the commits too).

This isn't *just* an optimisation; it's not just that we're pointlessly
invalidating and re-caching it. The problem is the time in *between*
those two, because we don't have atomic memslot updates (qv).

If we have to break apart a large memslot which contains the
shared_info GPA, then add back the two pieces and whatever we've
overlaid in the middle which broke it in two... there are long periods
of time when an interrupt might arrive and the shared_info GPA might
just be *absent*.

Using the HVA for the shinfo page makes a whole bunch of sense since
it's kind of supposed to be a xenheap page anyway and not defined by
the guest address it may — or may NOT — be mapped at. But more to the
point, using the HVA means that the kernel can continue to deliver
event channel interrupts (e.g. timer virqs, MSI pirqs from passthrough
devices, etc.) to it even when it *isn't* mapped.

We don't have the same problem for the vcpu_info because there's a per-
vcpu *shadow* of evtchn_pending_sel for that very purpose, which the
vCPU itself will OR into the real vcpu_info on the way into guest mode.

So since we have to stop all vCPUs before changing the memslots anyway,
the events can gather in that evtchn_pending_sel and it all works out
OK.

It would still be nice to have a way of atomically sticking an overlay
page over the middle of an existing memslot and breaking it apart, but
we can live without it. And even if we *did* get that, what you're
doing here makes a lot of sense anyway.

Attachment: smime.p7s
Description: S/MIME cryptographic signature