Re: [PATCH] genirq: Fix software resend lockup and nested resend

From: Marc Zyngier
Date: Sat Aug 26 2023 - 12:32:17 EST


Hi Johan,

On Sat, 26 Aug 2023 16:40:04 +0100,
Johan Hovold <johan+linaro@xxxxxxxxxx> wrote:
>
> The switch to using hlist for managing software resend of interrupts
> broke resend in at least two ways:
>
> First, unconditionally adding interrupt descriptors to the resend list
> can corrupt the list when the descriptor in question has already been
> added. This causes the resend tasklet to loop indefinitely with
> interrupts disabled as was recently reported with the Lenovo ThinkPad
> X13s after threaded NAPI was disabled in the ath11k WiFi driver. [1]

Gah, of course. Making the descriptor pending again isn't an
idempotent operation anymore (while setting an already set bit in the
bitmap was). You'll need the right timing, but it looks like you've
managed to achieve that reliably.

>
> This bug is easily fixed by restoring the old semantics of
> irq_sw_resend() so that it can be called also for descriptors that have
> already been marked for resend.
>
> Second, the offending commit also broke software resend of nested
> interrupts by simply discarding the code that made sure that such
> interrupts are retriggered using the parent interrupt.
>
> Add back the corresponding code that adds the parent descriptor to the
> resend list. Note that this bit is untested, but I decided to include it
> to avoid having to revert the offending commit and the maple tree
> conversion that depends on it.

Indeed, and that one is not timing related, but 100% broken.

>
> [1] https://lore.kernel.org/lkml/20230809073432.4193-1-johan+linaro@xxxxxxxxxx/
>
> Fixes: bc06a9e08742 ("genirq: Use hlist for managing resend handlers")
> Cc: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
> Signed-off-by: Johan Hovold <johan+linaro@xxxxxxxxxx>
> ---
>
> Hi Thomas and Marc,
>
> This patch fixes a severe regression in the resend code in 6.5-rc1 that
> breaks machines like the Lenovo X13s and which ideally should be
> addressed before 6.5 is released tomorrow.

This relies on Thomas seeing this email before tomorrow. Unless you
directly reach out to Linus to try and intercept the release.

>
> I hesitated about including the fix for nested interrupts as I've not
> had time to test this bit, but I ultimately decided to include it to
> avoid having to suggest a revert of the maple tree conversion. Let me
> know if you prefer to go this route and I'll post a (prepared) revert
> series instead.

I think it is too late for that revert to happen, and we might as well
plough along and take the fix.

>
> Johan
>
>
> kernel/irq/resend.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/resend.c b/kernel/irq/resend.c
> index edec335c0a7a..5f2c66860ac6 100644
> --- a/kernel/irq/resend.c
> +++ b/kernel/irq/resend.c
> @@ -68,11 +68,16 @@ static int irq_sw_resend(struct irq_desc *desc)
> */
> if (!desc->parent_irq)
> return -EINVAL;
> +
> + desc = irq_to_desc(desc->parent_irq);
> + if (!desc)
> + return -EINVAL;
> }
>
> /* Add to resend_list and activate the softirq: */
> raw_spin_lock(&irq_resend_lock);
> - hlist_add_head(&desc->resend_node, &irq_resend_list);
> + if (hlist_unhashed(&desc->resend_node))
> + hlist_add_head(&desc->resend_node, &irq_resend_list);
> raw_spin_unlock(&irq_resend_lock);
> tasklet_schedule(&resend_tasklet);
> return 0;

FWIW:

Reviewed-by: Marc Zyngier <maz@xxxxxxxxxx>

M.

--
Without deviation from the norm, progress is not possible.