e1000e NVM corruption issue status

From: Brandeburg, Jesse
Date: Thu Sep 25 2008 - 21:51:20 EST


A quick summary of the issue, if you think you have more data, please
reply. If you have had this issue, please reply with results of "cat
/proc/iomem" and "lspci". It will help us correlate data.

Problem: some users report that with many of the latest beta distros,
during a reboot when e1000e loads it says "NVM checksum is not valid" and
the driver fails to load.

Result: At this point it appears that most users can load the e1000e
driver if they skip the nvm validation error exit. LAN traffic may or may
not work at this point. Some users report they can dump their eeprom
using ethtool -e and see some varying data, most report the eeprom read
returns all ff ff ff

NOTE: if you have not had this problem, but wish to continue using e1000e
I strongly suggest you do a "ethtool -e eth0 > savemyeep.txt"

Many of the reports seem to be related in time to a graphics crash, no one
has been able to give us more detail about how to reproduce. We NEED HELP
reproducing this. Steps, hints, anything. We are trying rebooting,
suspending, opensuse, fedora, ubuntu, and several hardware platforms, etc.

This seems to effect both 32 and 64 bit kernels, but we haven't heard much
either way.

hardware affected:
laptops and desktops with 82566 or 82567 based LAN parts, which are
machines with the ICH8 and ICH9 chipsets and a variety of processors.
The machines I know of that have reported the issue include
Lenovo X300
HP 2510p
Intel DP35JO
Lenovo T61 (possibly)
Lenovo X61 (possibly)

Next steps:
We are still trying to reproduce the issue locally, we should have a
machine here tomorrow that reportedly had the issue with ubuntu.

We have a series of kernel patches that I will reply to this mail with
that may help users willing to test.

We should have ready (hopefully tomorrow) an app that should be able to
restore eeproms as long as the driver can still load.

We also have a band-aid patch that should allow "locking" of the NVM area
to prevent an errant write, we are looking to post that tomorrow. This
should prevent the damage but not really find the culprit.

Jesse
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/