Re: [bug] e100 bug: checksum mismatch on 82551ER rev10

From: Molle Bestefich
Date: Mon Jul 10 2006 - 13:39:33 EST


Auke Kok wrote:
> Every single IP130 I've had my hands on has had an EEPROM that the
> Linux driver declared bad.

that means that whoever is selling you the IP130's is consistently putting on
bad EEPROMs, which is *very* bad. Which vendor is that? They can fix this
problem for you and for *everyone* else they have sold and will sell IP130's
to in the future.

Nokia.

Maybe they've changed the BABA magic, or the checksum logic entirely,
to prevent other software than their own OS from running.

> I'm afraid that it's not the board that's at fault, it's the driver.

No it is not. The NIC is supported (you can even call Intel for first line
support) but if your vendor put a bad EEPROM image on it then all bets are
off. Intel provides the vendors with the proper tools to make valid EEPROMs,
the driver checks them for a very good reason.

You're completely sure that the EEPROM check is correct for this
particular revision of this particular chip?

(Do you happen to know where the EEPROM is located, by the way?
Just out of interest.
I can spot the three Intel chips but not the EEPROM.
http://chrisbuechler.com/m0n0wall/nokia/images/9.png
http://chrisbuechler.com/m0n0wall/nokia/images/11.png
http://chrisbuechler.com/m0n0wall/nokia/images/10.png
)

> The NICs are working perfectly.

How can you tell? Do you know if jumbo frames work correctly? Is the device
properly checksumming? is flow control working properly? These and many, many
more settings are determined by the EEPROM. Seemingly it may work correctly,
but there is no guarantee whatsoever that it will work correctly at all if the
checksum is bad. Again, you can lose data, or worse, you could corrupt memory
in the system causing massive failure (DMA timings, etc). Unlikely? sure, but
not impossible.

They've been used in production environments for years.

> (Also, it seems mighty odd to refuse to drive the hardware based on an
> EEPROM checksum failure, when the e100 driver will happily load for a
> device where for example IRQ routing is broken. Just another
> indication that erroring out in this situation is overkill.)

That is another discussion. All wifi drivers bail out if the firmware is
corrupted, why shouldn't e1000 be allowed to do so either? Are you willing to
risk your data?

Yes.
Perhaps an "ignorechecksum" switch would be appropriate.
I'd like to hear from anyone else who has IP130s and are experiencing
this problem (or isn't!).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/