Re: noisy edac

From: Eric W. Biederman
Date: Mon Jan 30 2006 - 22:27:07 EST


Doug Thompson <norsk5@xxxxxxxxx> writes:

> that driver should be refactored to only output NON-FATALs with debug turned on.

We need to root cause this before that decision is made.

> --- Dave Jones <davej@xxxxxxxxxx> wrote:
>
>> On Sun, Jan 29, 2006 at 04:52:06PM -0500, Alan Cox wrote:
>> > On Thu, Jan 26, 2006 at 08:41:05PM -0500, Dave Jones wrote:
>> > > e752x_edac is very noisy on my PCIE system..
>> > > my dmesg is filled with these...
>> > >
>> > > [91671.488379] Non-Fatal Error PCI Express B
>> > > [91671.492468] Non-Fatal Error PCI Express B
>> > > [91901.100576] Non-Fatal Error PCI Express B
>> > > [91901.104675] Non-Fatal Error PCI Express B
>> >
>> > Pre-production system or final release ?
>>
>> Currently shipping Dell Precision 470.
>>

Actually I remember something very weird with this error.
I think the Non-Fatal was some weird designation by Intel,
that came right out of the data sheet but didn't have a useful
interpretation.

I suspect that you have an actual hardware problem there.
Quite possibly that a riser or card is not plugged in all of
the way.

As I recall most of the checks clear the error bit when they
detect the error so the fact that it is reoccurring suggests
that the error is reoccurring this frequently.

So before we go off and give up and interpreting this error
can we please root cause at least what the driver is doing?

It has been about a year since I saw something like this and
someone else was tracking the error but I seem to remember
an error message like this resulting from an actual fixable problem.

Eric


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/