Re: radeon KMS causes GART Table Walk Errors (was: K8 ECC error with linux-2.6.32)

From: Johannes Hirte
Date: Thu Dec 17 2009 - 14:03:49 EST


Am Donnerstag 17 Dezember 2009 08:22:32 schrieb Borislav Petkov:
> On Thu, Dec 17, 2009 at 04:07:04AM +0100, Johannes Hirte wrote:
> > > > Disabling in BIOS doesn't made any difference. The errors were still
> > > > reported.
> > >
> > > Hmm. It would be interesting to know what the BIOS does exactly
> > > on your machine. We could easily find that out by installing the
> > > x86info tool (either prepackaged for your distro or from here:
> > > git://git.choralone.org/git/x86info) and doing as root:
> > >
> > > lsmsr MC4 -V3
> > >
> > > and sending me the output. Make sure the amd64_edac module is not
> > > loaded.
> >
> > datengrab ~ # lsmsr MC4 -V3
> > MC4_CTL = 0x0000000000003bff
> > CorrEccEn=0x1
> > UnCorrEccEn=0x1
> > CrcErr0En=0x1
> > CrcErr1En=0x1
> > CrcErr2En=0x1
> > SyncPkt0En=0x1
> > SyncPkt1En=0x1
> > SyncPkt2En=0x1
> > MstrAbrtEn=0x1
> > TgtAbrtEn=0x1
> > GartTblWkEn=0
>
> Was the BIOS setting about GART table walk errors reporting enabled or
> disabled? Because if it were enabled and according to the above output,
> your BIOS doesn't seem to do the workaround described in the BKDG. If it
> were disabled, you'd have to enable it and do the "lsmsr MC4 -V3" again.
>
> Thanks.

GART Error Reporting was disabled. Here is the output after enabling it:

datengrab ~ # lsmsr MC4 -V3
MC4_CTL = 0x0000000000003bff
CorrEccEn=0x1
UnCorrEccEn=0x1
CrcErr0En=0x1
CrcErr1En=0x1
CrcErr2En=0x1
SyncPkt0En=0x1
SyncPkt1En=0x1
SyncPkt2En=0x1
MstrAbrtEn=0x1
TgtAbrtEn=0x1
GartTblWkEn=0
AtomicRMWEn=0x1
WchDogTmrEn=0x1
DramParEn=0
MC4_STATUS = 0x0000000000000000
ErrorCode=0
ErrorCodeExt=0
Syndrome=0
ErrCpu0=0
ErrCpu1=0
LDTLink=0
ErrScrub=0
DramChannel=0
UnCorrECC=0
CorrECC=0
ECC_Synd=0
PCC=0
ErrAddrVal=0
ErrMiscVal=0
ErrEn=0
ErrUnCorr=0
ErrOver=0
ErrValid=0
MC4_ADDR = 0x0000000090063a20
ADDR=0x1200c744
MC4_MISC = 0x0000000000000000
ErrCount=0
Ovrflw=0
IntType=0
CntEn=0
LvtOff=0
Locked=0
CtrP=0
Val=0
MC4_CTL_MASK = 0x0000000000000000
CorrEccEn=0
UnCorrEccEn=0
CrcErr0En=0
CrcErr1En=0
CrcErr2En=0
SyncPkt0En=0
SyncPkt1En=0
SyncPkt2En=0
MstrAbrtEn=0
TgtAbrtEn=0
GartTblWkEn=0
AtomicRMWEn=0
WchDogTmrEn=0
DramParEn=0


regards,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/