Re: [PATCH] x86: warn on apic error

From: Maciej W. Rozycki
Date: Fri Jul 18 2008 - 15:03:02 EST


On Fri, 18 Jul 2008, Vegard Nossum wrote:

> There are certain APIC errors which are obviously programmer errors,
> e.g. writing to illegal APIC registers, or sending invalid interrupt
> vectors. Since the error interrupt happens spot on the erroneous code,
> we might as well make a bit of noise about it and display the stack-
> trace.
[...]
> @@ -1317,6 +1317,7 @@ void smp_error_interrupt(struct pt_regs *regs)
> */
> printk(KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
> smp_processor_id(), v , v1);
> + WARN_ON(v1 & ((1 << 0) | (1 << 2) | (1 << 5) | (1 << 7)));

Hmm, I think there is no point in dumping state on send checksum errors
as these normally signify a problem like a data line driven low on the
inter-APIC bus by another agent while the sender drove it high. Barring a
faulty chip, it would normally happen as a result of a problem with
arbitration, such as when duplicate IDs would happen on the bus. In this
case finding the triggering command would not help at all as the culprit
would lie elsewhere. It could result from crosstalk or some other problem
with board hardware too.

The rest is fine with me.

You need to check (v | v1) though as for Pentium processor integrated
APICs the error triggering the interrupt is provided by the first read of
the ESR register and for later implementations it is the second read that
does that after the latches have been updated by the write access between.

Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/