[bug report] GICv4.1: vSGI remains pending across the guest reset

From: Kunkun Jiang
Date: Thu Dec 14 2023 - 07:14:38 EST


Hi list,

We have observed on GICv4.1 systems that, after a guest reset, the
secondary VCPU would receive an IPI_CPU_STOP accidently and failed to
come online eventually.

| Guest User-space
|
| reset (with a vSGI pending in the
| hardware) [0]
|
| disable the distributor (write 0
| into GICD_CTLR) [1]
|
| clear pending status (write 0 into
| GICR_ISPENDR0 for each RD) [2]
|
| disable the distributor (write 0
| into GICD_CTLR) [3]
|
| enable the distributor with ARE,
| Group1 and nASSGIreq [4]

The problem is that even if user-space tries to disable the distributor
and clear pending bits for all SGIs, we don't actually propogate it into
HW (we only record it via vgic_dist->{enabled, nassgireq} and
vgic_irq->pending_latch) and the vSGI remains pending across the guest
reset.

And when we're at [4], all vSGI's vgic_irq->hw are *true* and
vgic_v4_enable_vsgis() becomes a nop.. That's not good.

The following solution can solve the problem. Not sure if this is a good
solution.Sent out early for suggestions or solutions.

diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 89117ba2528a..3678ab33f5b9 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -374,6 +374,10 @@ static int vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
             irq->pending_latch = true;
             vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
         } else {
+             if (irq->hw && vgic_irq_is_sgi(irq->intid))
+                 irq_set_irqchip_state(irq->host_irq,
+                              IRQCHIP_STATE_PENDING,
+                              false);
             irq->pending_latch = false;
             raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
         }