Re: [PATCH] KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUs

From: Sean Christopherson
Date: Mon Feb 06 2023 - 16:56:24 EST


TL;DR: I'm going to post v2 of this along with some other fixes+hardening.

On Tue, Jan 17, 2023, Suravee Suthikulpanit wrote:
> AVIC GATag is managed by the SVM driver, and is used by the IOMMU driver

s/SVM driver/KVM

> to program the AMD IOMMU IRTE to provide reference back to SVM in case

s/SVM/KVM

We do sometimes use "VMX" and "SVM" to refer to KVM, but usually only when
differentiating betwen "KVM x86" and "KVM <vendor>". In most cases I don't think
it would matter, but in this particular case, since the GATag is kinda sorta
consumed by hardware, but IIUC is purely software-defined, knowing whether "SVM"
means "KVM" or "SVM the architecture" is an important distinction.

> the IOMMU cannot inject interrupt to the non-running vCPU. In such case,
> the IOMMU hardware notify the IOMMU driver by creating GALog entry with

Definitely a nit, but I would probably omit the info about the IOMMU driver. That
information matters if someone is trying to understand _all_ of the pieces, but
for this specific bug I think it just ends up introducing noise.

> the corresponded GATag. The IOMMU driver processes the GALog entry and
> notifies SVM to schedule in the target vCPU.
>
> Currently, SVM uses 8-bit vCPU ID and 24-bit VM ID to encode 32-bit GATag.
> Since x2AVIC supports upto 512 vCPUs, it requires to use at least 9-bit

"up to"

Nit, x2AVIC doesn't "require" anything. What matters is what KVM allows. I get
what you're saying, but I want to cleanly separate "what's allowed by the spec"
and "what does KVM actually do".

> vCPU ID.
>
> Therefore, modify the GATag enconding to use the number of bits required

s/enconding/encoding

> to support the maximum vCPUs.

Thank you for the explanation of how this all works! IIUC, this is missing one
key point though: the GATag is 100% software-defined and never interpreted by
hardware. I.e. KVM can shove whatever it wants into the tag.

And while I'm nitpicking, please lead with the "what". For the changelog as a whole,
some maintainers/subsystems prefer leading with the "why", but I strongly prefer that
changelogs state what the patch actually does and then provide the background/justification.

Copy-pasting a prior copy-paste (I really need to save this as a Vim macro formletter)

: To some extent, it's a personal preference, e.g. I
: find it easier to understand the details (why something is a problem) if I have
: the extra context of how a problem is fixed (or: what code was broken).
:
: But beyond personal preference, there are less subjective reasons for stating
: what a patch does before diving into details. First and foremost, what code is
: actually being changed is the most important information, and so that information
: should be easy to find. Changelogs that bury the "what's actually changing" in a
: one-liner after 3+ paragraphs of background make it very hard to find that information.
:
: Maybe for initial review one could argue that "what's the bug" is more important,
: but for skimming logs and git archeology, the gory details matter less and less.
: E.g. when doing a series of "git blame", the details of each change along the way
: are useless, the details only matter for the culprit; I just want to quickly
: determine whether or not a commit might be of interest.
:
: Another argument for stating "what's changing" first is that it's almost always
: possible to state "what's changing" in a single sentence. Conversely, all but the
: most simple bugs require multiple sentences or paragraphs to fully describe the
: problem. If both the "what's changing" and "what's the bug" are super short then
: the order doesn't matter. But if one is shorter (almost always the "what's changing),
: then covering the shorter one first is advantageous because it's less of an
: inconvenience for readers/reviewers that have a strict ordering preference. E.g.
: having to skip one sentence to get to the stuff you care about is less painful than
: me having to skip three paragraphs to get to the stuff that I care about.

[*] https://lore.kernel.org/all/YurKx+gFAWPvj35L@xxxxxxxxxx

> Reported-by: Alejandro Jimenez <alejandro.j.jimenez@xxxxxxxxxx>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
> ---
> arch/x86/include/asm/svm.h | 3 ++-
> arch/x86/kvm/svm/avic.c | 9 ++++-----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 0361626841bc..6738faf155e4 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -256,7 +256,8 @@ enum avic_ipi_failure_cause {
> AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
> };
>
> -#define AVIC_PHYSICAL_MAX_INDEX_MASK GENMASK_ULL(9, 0)

Isn't the existing mask wrong? The high bit is inclusive, i.e. this is defining
a mask of 10 bits, not 9.

Actually, neither 10 nor 9 bits is correct if we are going by the most recent APM,
i.e. the mask should be 8:0 if we want to tie this to what is actually defined in
the architecture.

All the addresses point to 4-Kbyte aligned data structures. Bits 11:0 are reserved
(except for offset 0F8h) and should be set to zero. The lower 8 bits of offset 0F8h
are used for the field AVIC_PHYSICAL_MAX_INDEX. VMRUN fails with #VMEXIT(VMEXIT_INVALID)
if AVIC_PHYSICAL_MAX_INDEX is greater than 255 in xAVIC mode or greater than 511 in
x2AVIC mode.

Looking through the discussion history of the x2AVIC enabling, this appears to be
an off-by-one goof, i.e. not a deliberate speculation on how bits 11:9 will be
used.

> +#define AVIC_PHYSICAL_MAX_INDEX_BITS 9

This name is misleading/wrong. As above, the high bit is inclusive. If we wanted
to specify the high bit, it would be something like MAX_INDEX_MAX_BIT, which is pretty
awful.

Ha! We don't need a separate define. const_hweight.h provides compile-time
hweight, a.k.a. popcount, macros. We can use that to compute the shift of the
VM ID portion of the GATag, then we don't need to come up with a name.

I'll post the below as v2 on top of the AVIC_PHYSICAL_MAX_INDEX_MASK fix.

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index ca684979e90d..326341a22153 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -27,19 +27,29 @@
#include "irq.h"
#include "svm.h"

-/* AVIC GATAG is encoded using VM and VCPU IDs */
-#define AVIC_VCPU_ID_BITS 8
-#define AVIC_VCPU_ID_MASK ((1 << AVIC_VCPU_ID_BITS) - 1)
+/*
+ * Encode the arbitrary VM ID and the vCPU's default APIC ID, i.e the vCPU ID,
+ * into the GATag so that KVM can retrieve the correct vCPU from a GALog entry
+ * if an interrupt can't be delivered, e.g. because the vCPU isn't running.
+ *
+ * For the vCPU ID, use however many bits are currently allowed for the max
+ * guest physical APIC ID (limited by the size of the physical ID table), and
+ * use whatever bits remain to assign arbitrary AVIC IDs to VMs. Note, the
+ * size of the GATag is defined by hardware (32 bits), but is an opaque value
+ * as far as hardware is concerned.
+ */
+#define AVIC_VCPU_ID_MASK AVIC_PHYSICAL_MAX_INDEX_MASK

-#define AVIC_VM_ID_BITS 24
-#define AVIC_VM_ID_NR (1 << AVIC_VM_ID_BITS)
-#define AVIC_VM_ID_MASK ((1 << AVIC_VM_ID_BITS) - 1)
+#define AVIC_VM_ID_SHIFT HWEIGHT32(AVIC_PHYSICAL_MAX_INDEX_MASK)
+#define AVIC_VM_ID_MASK (GENMASK(31, AVIC_VM_ID_SHIFT) >> AVIC_VM_ID_SHIFT)

-#define AVIC_GATAG(x, y) (((x & AVIC_VM_ID_MASK) << AVIC_VCPU_ID_BITS) | \
+#define AVIC_GATAG(x, y) (((x & AVIC_VM_ID_MASK) << AVIC_VM_ID_SHIFT) | \
(y & AVIC_VCPU_ID_MASK))
-#define AVIC_GATAG_TO_VMID(x) ((x >> AVIC_VCPU_ID_BITS) & AVIC_VM_ID_MASK)
+#define AVIC_GATAG_TO_VMID(x) ((x >> AVIC_VM_ID_SHIFT) & AVIC_VM_ID_MASK)
#define AVIC_GATAG_TO_VCPUID(x) (x & AVIC_VCPU_ID_MASK)

+static_assert(AVIC_GATAG(AVIC_VM_ID_MASK, AVIC_VCPU_ID_MASK) == -1u);
+
static bool force_avic;
module_param_unsafe(force_avic, bool, 0444);