Re: [PATCH 2/2] irqchip/sifive-plic: Fix the interrupt was enabled accidentally issue.

From: Thomas Gleixner
Date: Fri Oct 16 2020 - 09:42:33 EST


On Thu, Oct 15 2020 at 22:23, Marc Zyngier wrote:
> On 2020-10-12 14:57, Greentime Hu wrote:
>> In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if
>> activated opt-in"),

This commit sha does not exist in mainline. Referencing a random stable
kernel tree is useless.

>> it added irqd_affinity_on_activate() checking in the function
>> irq_set_affinity_deactivated() so it will return false here.
>> In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
>> which will enable the interrupt to cause the CPU hang.
>> @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain
>> *d, unsigned int irq,
>> irq_hw_number_t hwirq)
>> {
>> struct plic_priv *priv = d->host_data;
>> + struct irq_data *irqd;
>>
>> irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
>> handle_fasteoi_irq, NULL, NULL);
>> irq_set_noprobe(irq);
>> + irqd = irq_get_irq_data(irq);
>> + irqd_set_single_target(irqd);
>> + irqd_set_affinity_on_activate(irqd);
>> irq_set_affinity(irq, &priv->lmask);
>> return 0;
>> }
>
> How does this fix anything? The plic driver doesn't have an activate
> callback, so how does it make any difference? I get the feeling this
> papers over another issue.

The existence of the activate callback is not really the interesting
part. And of course the analysis is completely bogus.

The referenced commit is a follow up to:

baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")

which was added on Jul 17 and the fix was added on Jul 24:

f0c7baca1800 ("genirq/affinity: Make affinity setting if activated opt-in")

The latter restored the original behaviour within 7 days, except for
x86. And the original behaviour was that the core did not care about the
activated state at all. It just invoked irqchip->irq_set_affinity()
unconditionally, which is still does except when the interrupt is marked
with irqd_set_affinity_on_activate().

So the plic code had this problem before that change already:

irq_set_noprobe(irq);
irq_set_affinity(irq, &priv->lmask);

Mapping happens way before anything else, so Hu is definitely barking up
the wrong tree.

What the patch works around is the irq_set_affinity() callback which is
completely bogus and still broken even after that duct tape which"
fixes" the boot fail.

irq_set_affinity() can be called on masked interrupts so this:

plic_irq_toggle(&priv->lmask, d, 0);
plic_irq_toggle(cpumask_of(cpu), d, 1);

will always unconditionally unmask the interrupt when affinity is
changed.

plic_irq_toggle(cpumask_of(cpu), d, !irqd_irq_masked(d));

is all it needs to work everywhere. Obvious, right?

The changelog wants to be "...: Fix broken irq_set_affinity() callback"
and the Fixes: tag referencing the commit which introduced that
unconditional unmask. Oh well.

Thanks,

tglx