RE: APIC error on CPUx

From: Mikael Pettersson
Date: Wed May 24 2006 - 04:10:45 EST


Brown, Len writes:
> Vladimir's SCB2/Serverworks boots with and without "acpi=off",
> and in both cases the IOAPICS are set up properly,
> the device work, and there are the following messages:
>
> APIC error on CPU1: 00(40)
> APIC error on CPU0: 00(40)
> APIC error on CPU0: 40(40)
> APIC error on CPU0: 40(40)
> APIC error on CPU0: 40(40)
>
> These are the now infamous "Receive illegal vector" messages.
> I expect this chipset has a physical APIC bus (rather than
> the FSB delivery used today) which is mis-behaving.
>
> I've never heard of this being associated with an actual
> failure (such as a lost interrupt). This message is already
> KERN_DEBUG -- can't get any lower priority than that.
> Maybe we should put this message under apic_printk()?

The default must be that APIC errors are logged. They are valid
indicators of hardware brokenness, and have in the past been
traced to bad mobos, bad PSUs, bad cooling, and even kernel bugs.
The fact that many systems manage to limp along in their presence
doesn't matter.

The default for apic_printk is to print nothing (apic_verbosity
== APIC_QUIET). You could add a fourth level (ERROR or WARNING)
between QUIET and VERBOSE, make apic_verbosity initialise to that
level, and use that level for the APIC error messages. That would
make the messages visible by default but possible to suppress for
knowledgeable users.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/