Re: [RFC] irq-gic-v3-its: fix occasional VLPI drop

From: Heyi Guo
Date: Tue Jan 22 2019 - 07:44:41 EST


Hi Marc,

Thanks for your feedback. Please see my comments below.


On 2019/1/22 17:53, Marc Zyngier wrote:
Hi Heyi,

On Mon, 21 Jan 2019 11:51:48 +0000,
Heyi Guo <guoheyi@xxxxxxxxxx> wrote:
Every VLPI will temporarily be mapped to the first CPU in system
(normally CPU0) and then moved to the real scheduled CPU later. There
is a time window so a VLPI may be sent to CPU0 instead of the real
scheduled vCPU, in a multi-CPU virtual machine. However, CPU0 may have
not been scheduled as a virtual CPU after system boots up, so the
value of GICR_VPROPBASER still be the reset value. According to GIC
spec, the reset value of IDbits in GICR_VPROPBASER is architecturally
UNKNOWN, and the GIC will behave as if all virtual LPIs are out of
range if it is less than 0b1101. On our platform the GICR will simply
drop the incoming VLPI, which results in interrupt missing in Guest.
OK, it took me some time to page this horror back in. So let's see if
I can sum-up the issue correctly:
Sorry for not explaining the whole thing clearly...


- When a VM gets created, all the vPEs are mapped to CPU0's
redistributor.
Not exactly on VM geting created, but when the passthru PCI device driver in Guest tries to enable MSI interrupts. The specific code is in its_map_vm().

- If a device starts emitting VLPIs targeting a vPE that has not run
yet, these VLPIs are forwarded to CPU0's redistributor.

- If CPU0 has itself never run any vPE, its GICR_PROPBASER is not
initialised, meaning that the IDbits field may contain a value that
makes the redistributor drop the interrupt on the floor.
Yes.

Is that a correct assessment of the issue you're seeing? If so, I
think you have a very good point here, and this looks like a hole in
the driver.

Comments below:

As no code will clear GICR_VPROPBASER at runtime, we can safely
initialize the IDbits field at boot time for each CPU to get rid of
this issue.

Signed-off-by: Heyi Guo <guoheyi@xxxxxxxxxx>
Signed-off-by: Heyi Guo <heyi.guo@xxxxxxxxxx>
---
drivers/irqchip/irq-gic-v3-its.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index db20e99..6116215 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2144,6 +2144,20 @@ static void its_cpu_init_lpis(void)
val |= GICR_CTLR_ENABLE_LPIS;
writel_relaxed(val, rbase + GICR_CTLR);
+ /*
+ * Temporary workaround for vlpi drop on Hi1620.
Why is this specific to this implementation? Isn't this an issue that
affects every GICv4 implementations?
This was an internal patch and I forgot to modify the comment before sending out, either not 100% sure that it is the common behavior of GICv4 to drop VLPI if IDbits is not correctly configured.
I can change it in V2.


+ * IDbits must be set before any VLPI is sent to this CPU, or else the
+ * VLPI will be considered as out of range and dropped.
+ */
+ if (gic_rdists->has_vlpis) {
+ void __iomem *vlpi_base = gic_data_rdist_vlpi_base();
+
+ val = (LPI_NRBITS - 1) & GICR_VPROPBASER_IDBITS_MASK;
+ pr_info("GICv4: CPU%d: Init IDbits to 0x%llx for GICR_VPROPBASER\n",
+ smp_processor_id(), val);
I don't think this pr_info is useful to a normal user, as it is only
debug information. I'm actually minded to demote a bunch of the GICv3
prints to pr_debug.
OK.
+ gits_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER);
+ }
+
I think we need to clear GICR_VPENDBASER.Valid too (you can probably
reuse part of its_vpe_deschedule for that), so that we don't get into
a bizarre situation where CPU0's redistributor has some ancient
programming left in, and could start corrupting memory.
I can do that for safety. But is it possible of corrupting memory? Even if GICR_VPENDBASER.Valid==1, I don't think it is possible that GICR_VPENDBASER.Physical_Address equals to VPT_addr, so kernel should consider vPE is not sheculed on CPU0 and only memory pointed by VPT_addr will be modified. Please let me know if I'm wrong :)

/* Make sure the GIC has seen the above */
dsb(sy);
out:
--
1.8.3.1

Can you please respin this quickly with the above changes?
Sure.

Thanks,
Heyi

Thanks,

M.