Re: Unhandled IRQs on AMD E-450

From: Clemens Ladisch
Date: Sun Dec 04 2011 - 12:00:22 EST


Jeroen Van den Keybus wrote:
> The problem occurs with the e1000 idle (unplugged) and under heavy
> usage (plugged). Time to failure is also in the same order of
> magnitude (i.e. 1..30 minutes). As of now, I never had IRQ 19 disabled
> with the e1000 removed. The e1000 delivered with Ubuntu isn't
> particularly recent (7.3.21-k8-NAPI).

That version number doesn't mean much; there have been many changes to
the kernel driver since it was last updated.

One interesting patch is <http://git.kernel.org/linus/4c11b8adbc48>;
please check if you have it (the file was recently moved into
drivers/net/ethernet/intel/e1000/). But it's from January, your 3.2-rc*
should already have it.

> I have succeeded in catching a lspci on the SATA controller with INTx+
> while IRQ 19 is disabled. [...]
> The fact that the next lspci's showed INTx- shows that its pin is
> definitely not stuck, does it not ?

Indeed; that SATA controller appears to work fine.

> I already lost IRQ 16 after 2 minutes. This kernel doesn't have
> support for any audio, so there was only firewire_ohci on this line.
> However, lspci for this device shows a firm INTx+.

Your VT6308 is a widely-used chip, and there are no known interrupt-
related problems with it.

This PCI status register is part of the device itself, i.e., the
FireWire controller chip; there is nothing in the rest of the system,
hardware or software, that could affect this INTx value. This means
that the controller itself thinks that there is some FireWire-related
reason for the interrupts.

To instruct the firewire-ohci driver to log all interrupts and what the
device thinks the reason for them is, please run:

echo 4 > /sys/module/firewire_ohci/parameters/debug

As long as there is nothing connected, there should be nothing but
a timing interrupt every 64 seconds, like this:
firewire_ohci: IRQ 00200000 cycle64Seconds

> After 20 min. IRQ 19 was lost again.
>
> Now _I_ am lost. The only thing that IRQ 16 and IRQ 19 have in common
> is that there are devices on them that do have an INTx line but do not
> use it (MSI instead). However, I ran this kernel with pci=nomsi
> (earlier post) and IRQs 16 and 19 went down as well.

>From the information available so far, it appears that you have two
similar but _independent_ problems with the e1000 and firewire devices.
(It might be possible that static electricity zapped both your PCI card
and the FireWire controller (which is directly near the first PCI slot),
or something like that.)


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/