What are the real ioapic rte programming constraints?

From: Eric W. Biederman
Date: Sat Feb 10 2007 - 18:54:25 EST



I have recently been investigating why we reprogram ioapic irqs in the
interrupt handler, because it significantly complicates the code, and
makes things more fragile. Eventually I found the commit with the
justification, see below.

There are not enough details in the justification to really understand
the issue so I'm asking to see if someone has some more details.

The description makes the assertion that reprograming the ioapic
when an interrupt is pending is the only safe way to handle this.
Since edge triggered interrupts cannot be pending at the ioapic I know
it is not talking level triggered interrupts.

However it is not possible to fully reprogram a level triggered
interrupt when the interrupt is pending as the ioapic will not
receive the interrupt acknowledgement. So it turns out I have
broken this change for several kernel releases without people
screaming at me about io_apic problems.

Currently I am disabling the irq on the ioapic before reprogramming
it so I do not run into issues. Does that solve the concerns that
were patched around by only reprogramming interrupt redirection
table entry in interrupt handlers?

If it does I can solve and simply the code by moving it all back
into process context.

commit 54d5d42404e7705cf3804593189e963350d470e5
Author: Ashok Raj <ashok.raj@xxxxxxxxx>
Date: Tue Sep 6 15:16:15 2005 -0700

[PATCH] x86/x86_64: deferred handling of writes to /proc/irqxx/smp_affinity

When handling writes to /proc/irq, current code is re-programming rte
entries directly. This is not recommended and could potentially cause
chipset's to lockup, or cause missing interrupts.

CONFIG_IRQ_BALANCE does this correctly, where it re-programs only when the
interrupt is pending. The same needs to be done for /proc/irq handling as well.
Otherwise user space irq balancers are really not doing the right thing.

- Changed pending_irq_balance_cpumask to pending_irq_migrate_cpumask for
lack of a generic name.
- added move_irq out of IRQ_BALANCE, and added this same to X86_64
- Added new proc handler for write, so we can do deferred write at irq
handling time.
- Display of /proc/irq/XX/smp_affinity used to display CPU_MASKALL, instead
it now shows only active cpu masks, or exactly what was set.
- Provided a common move_irq implementation, instead of duplicating
when using generic irq framework.

Tested on i386/x86_64 and ia64 with CONFIG_PCI_MSI turned on and off.
Tested UP builds as well.

MSI testing: tbd: I have cards, need to look for a x-over cable, although I
did test an earlier version of this patch. Will test in a couple days.

Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx>
Acked-by: Zwane Mwaikambo <zwane@xxxxxxxxxxxxxx>
Grudgingly-acked-by: Andi Kleen <ak@xxxxxx>
Signed-off-by: Coywolf Qi Hunt <coywolf@xxxxxxxxxx>
Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/