Re: [PART1 RFC v2 07/10] svm: Add VMEXIT handlers for AVIC

From: Radim KrÄmÃÅ
Date: Wed Mar 09 2016 - 15:55:24 EST


2016-03-04 14:46-0600, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
>
> Introduce VMEXIT handlers, avic_incp_ipi_interception() and
> avic_noaccel_interception().
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -3690,6 +3690,264 @@ static int mwait_interception(struct vcpu_svm *svm)
> + case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: {
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + if (!kvm_apic_match_dest(vcpu, apic,
> + icrl & APIC_SHORT_MASK,
> + GET_APIC_DEST_FIELD(icrh),
> + icrl & APIC_DEST_MASK))
> + continue;
> +
> + kvm_vcpu_kick(vcpu);

KVM shouldn't kick VCPUs that are running. (Imagine a broadcast when
most VCPUs are in guest mode.)

I think a new helper might be useful here: we only want to wake up from
wait queue, but never force VCPU out of guest mode ... kvm_vcpu_kick()
does both.

> +static int avic_noaccel_trap_write(struct vcpu_svm *svm)
> +{
> + switch (offset) {
> + case APIC_ID: {
> + case APIC_LDR: {
> + case APIC_DFR: {
> + }

It's not enough to modify the AVIC map here. Userspace can also change
the APIC page with kvm_vcpu_ioctl_set_lapic, so AVIC would better hook
into some common path.

I think that AVIC map should be connected to recalculate_apic_map() and
'struct kvm_apic_map' as we already have the mode and a coupling of
LAPICs and VCPUs there.

recalculate_apic_map() is currently quite wasteful as it recomputes the
whole map on every change, but its simplicity should be bearable.

> +static int avic_noaccel_interception(struct vcpu_svm *svm)
> +{
> + int ret = 0;
> + u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
> + u32 rw = (svm->vmcb->control.exit_info_1 >> 32) & 0x1;

Change "u32 rw" to "bool write"

> + u32 vector = svm->vmcb->control.exit_info_2 & 0xFFFFFFFF;

and please #define those masks.

> + pr_debug("%s: offset=%#x, rw=%#x, vector=%#x, vcpu_id=%#x, cpu=%#x\n",
> + __func__, offset, rw, vector, svm->vcpu.vcpu_id, svm->vcpu.cpu);
> +
> + BUG_ON(offset >= 0x400);

These are valid faulting registers, so our implementation has to handle
them. (And the rule is to never BUG if a recovery is simple.)

> + switch (offset) {
> + case APIC_ID:
> + case APIC_EOI:
> + case APIC_RRR:
> + case APIC_LDR:
> + case APIC_DFR:
> + case APIC_SPIV:
> + case APIC_ESR:
> + case APIC_ICR:
> + case APIC_LVTT:
> + case APIC_LVTTHMR:
> + case APIC_LVTPC:
> + case APIC_LVT0:
> + case APIC_LVT1:
> + case APIC_LVTERR:
> + case APIC_TMICT:
> + case APIC_TDCR: {

(Try a helper that returns true/false for trap/fault registers, the code
might look nicer.)

> + /* Handling Trap */
> + if (!rw) /* Trap read should never happens */
> + BUG();
> + ret = avic_noaccel_trap_write(svm);
> + break;
> + }
> + default: {
> + /* Handling Fault */
> + if (rw)
> + ret = avic_noaccel_fault_write(svm);
> + else
> + ret = avic_noaccel_fault_read(svm);
> + skip_emulated_instruction(&svm->vcpu);

AVIC doesn't tell us what it wanted to write, so KVM has to emulate the
instruction.