Re: [PATCH V3] KVM: x86: Sync the pending Posted-Interrupts

From: Paolo Bonzini
Date: Thu Jan 31 2019 - 04:25:33 EST


On 31/01/19 09:52, Luwei Kang wrote:
> Some Posted-Interrupts from passthrough devices may be lost or
> overwritten when the vCPU is in runnable state.
>
> The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
> be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
> but not running on physical CPU). If a posted interrupt coming at this
> time, the irq remmaping facility will set the bit of PIR (Posted
> Interrupt Requests) without ON (Outstanding Notification).
> So this interrupt can't be sync to APIC virtualization register and
> will not be handled by Guest because ON is zero.
>
> Signed-off-by: Luwei Kang <luwei.kang@xxxxxxxxx>
> ---
> arch/x86/kvm/vmx/vmx.c | 5 +++++
> arch/x86/kvm/x86.c | 2 +-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4341175..8ed9634 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1221,6 +1221,11 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> new.sn = 0;
> } while (cmpxchg64(&pi_desc->control, old.control,
> new.control) != old.control);
> +

/*
* Clear SN before reading the bitmap. The VT-d firmware
* writes the bitmap and reads SN atomically (5.2.3 in the
* spec), so it doesn't really have a memory barrier that
* pairs with this, but we cannot do that and we need one.
*/

> + smp_mb__after_atomic();
> +
> + if (!bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS))
> + pi_test_and_set_on(pi_desc);

You can add pi_set_on for use here. The fast path with pi_clear_sn
should also be removed.

> }

> /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3d27206..5bcf2c4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7794,7 +7794,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> * 1) We should set ->mode before checking ->requests. Please see
> * the comment in kvm_vcpu_exiting_guest_mode().
> *
> - * 2) For APICv, we should set ->mode before checking PIR.ON. This
> + * 2) For APICv, we should set ->mode before checking PID.PIR. This

This should be PID.ON.

Paolo

> * pairs with the memory barrier implicit in pi_test_and_set_on
> * (see vmx_deliver_posted_interrupt).
> *
>