Re: Linux & ECC memory

Tim W. Janes (
Fri, 15 Nov 1996 22:03:53 +0000 (GMT)

> > A more subtle issue is whether the ECC memory controller could report
> > instances where ECC detection and successful correction took place. It
> > would seem to be useful to provide a way for the OS to recognize that
> > non-fatal memory errors have occured, even though they were repaired.
> >
> This is actually the more interesting issue. It isn't so important that
> the kernel know how to work around a bad spot that has been ECC corrected.
> It is much more interesting to me to have the kernel tell me that a
> correction was made (soft error) so that I have the opportunity to replace
> it before it degrades and a hard error occurs.
> This is really the whole advantage of ECC. It saves you from those pesky
> one bit errors and reports them so you can act before it worsens.
> So my question would be "Does Linux log ECC corrections?". And from the
> responses I'd infer that one ore more of the following applies: 1) nobody
> really knows; 2) the boards are too new yet and it may come; 3) the
> boards are too brain dead to have a way to report this info and it will
> never come.
> I'd be interested in an informed answer as we've got 45 new linux server
> boxes on order with ECC memory spec'd. Lack of kernel knowledge of ECC
> won't hamper us but this support would sure be nice.
> --Eric

On some pentiumPro Motherboards (sorry not sure of chip set or
details) there is a page in the BIOS setup that records the ECC
corrections. My guess is that ECC corrections are not reported to the
O/S but it should be possible to probe the CMOS and retrieve the ECC
correction info.

We have got 128Mbyte parity memory installed on 10 machines and in 3
months have seen one NMI.