Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

From: Nix
Date: Thu Nov 04 2010 - 17:35:58 EST


On 4 Nov 2010, Jesse Brandeburg outgrape:

> On Mon, 2010-11-01 at 16:08 -0700, Nix wrote:
>> 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
>> Connection
>
>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>
> This is a problem, L0s and L1 don't work on these adapters, make sure

Ah. That sounds like my problem then :) I must say spontaneous link loss
is a rather nasty failure mode (I suspect the hardware simply forgets to
save its state properly when powering down, am I right? So it comes up
effectively turned off and uninitialized...)

> The above could be responsible for your issue. If you don't want to
> disable ASPM system wide, then you could just make sure to run a recent
> kernel with the ASPM patches, or get our e1000.sf.net e1000e driver and
> try it, as it will work around the issue whether or not aspm is enabled.

I was planning to simply split up CONFIG_PCIEASPM to allow me to turn
it off for e1000e only, but this sounds a lot less kludgy. :)

(For now, it's probably simplest to just turn ASPM off, as a quick grep
of the kernel tree shows that that machine has no other hardware for
which ASPM would do a blessed thing: in fact, there *is* no other
hardware in 2.6.36 for which ASPM would do a blessed thing: so if the
only workaround is to turn ASPM off for those adapters, for now I might
as well just turn it off completely.)

Thanks for the advice!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/