mass "tulip_stop_rxtx() failed", network stops

From: Tomasz Chmielewski
Date: Tue Aug 23 2005 - 04:11:49 EST


We are running almost 20 Fujitsu-Siemens Scenic machines, 2.6.8.1 kernel, equipped with a onboard card that uses a tulip module:

02:01.0 Ethernet controller: Linksys NC100 Network Everywhere Fast Ethernet 10/100 (rev 11)

No problem with those.


We are running four more machines like that, the only difference is the kernel they are running (2.6.11.4).

On some of them, there are serious problems with a network, and they usually happen when the traffic is bigger than usual (i.e., some big software deployment to several workstations, remote backup, etc.).

The syslog is then full of entries like that:

Aug 21 04:04:30 SERVER-B-HS kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 21 04:04:30 SERVER-B-HS kernel: 0000:00:06.0: tulip_stop_rxtx() failed

and it's filling logs for hours; network doesn't work anymore, and someone has to restart the network or the machine itself.

It doesn't always happen with a big traffic - sometimes you can fill the 100 Mbit link and do lots of reads from the disk, but nothing bad happens for hours.


I saw some posts on this issue ("2.6.10-rc3: tulip-driver: tulip_stop_rxtx() failed"), but it seemed to me that it wasn't similar to my problems; I looked into >2.6.10 kernel changelog, but there were no descriptions of that problem, either.


Any help appreciated, because rebooting machines which are 500 km away and are not responding is no fun :)


--
Tomek
http://wpkg.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/