Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

From: Eric W. Biederman
Date: Fri May 18 2007 - 10:21:21 EST


Andi Kleen <ak@xxxxxxx> writes:

> On Friday 18 May 2007 01:03, Siddha, Suresh B wrote:
>
>> Normally, the EOI generated by local APIC for level trigger interrupt
>> contains vector number. The IOAPIC will take this vector number and
>> search the IOAPIC RTE entries for an entry with matching vector number and
>> clear the remote IRR bit (indicate EOI). However, if the vector number is
>> changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI
>> is received later. This will cause the remote IRR to get stuck causing the
>> interrupt hang (no more interrupt from this RTE).
>
> Does this happen often or did you only see it in some extreme or obscure
> case?

It is obscure. Someone may have a case that makes it easy to
reproduce but it won't happen frequently.

So we should have enough time to review and fix this carefully,
instead of needing to rush a fix out.

>> + /*
>> + * If the EOI still didn't reach the RTE corresponding to the
>> + * level triggered irq, postpone the irq migration to the next
>> + * irq arrival event.
>> + */
>> + if (pending_eoi(irq)) {
>> + irq_desc[irq].status |= IRQ_MOVE_PENDING;
>> + return;
>
> Other code seems to have similar problems, but we don't have any lock
> protecting that bitmap against parallel updates outside the irq itself, don't
> we? Perhaps it needs to be all set_bit()

Huh? The irq lock is sufficient for this.

This is not a code software problem, this is a software hardware
interaction problem. This is a problem in that there is no way to
guarantee that ioapics see a series of manipulations in the same
order that the cpu issues those manipulations.

In particular the ioapic is seeing the ack after we reprogram
the ioapic, which causes the ioapic to not accept the ack (because
it's vector number changed) and then the ioapic waits forever
for an ack that isn't coming.

The other half of the problem is pending_eoi() is testing a bit in the
hardware that in my experience is not sufficient to guarantee that
the hardware is in a state where action may be taken.

This is a very delicate problem because there is little if any guaranteed
ordering between cpu operations and ioapic operations, especially because
there are two channels of communication the ``apic bus'' and the memory
mapped ioapic registers. So what happens on one channel has no ordering
requirements of any kind with what happens on the other channel.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/