AMD 760MPX dma_intr: error=0x40 { UncorrectableError }

From: Steven Timm (timm@fnal.gov)
Date: Tue Nov 19 2002 - 16:36:53 EST

Next message: David Woodhouse: "Re: [RFC/CFT] Separate obj/src dir"
Previous message: Con Kolivas: "[BENCHMARK] 2.5.48-mm1 with contest"
Next in thread: Alan Cox: "Re: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Reply: Alan Cox: "Re: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I have recently observed a large frequency of this error on
a bunch of compute servers with brand new disks.

Nov 15 01:42:52 fnd0172 kernel: hdb: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 15 01:42:52 fnd0172 kernel: hdb: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=44763517, sector=11235856
Nov 15 01:42:52 fnd0172 kernel: end_request: I/O error, dev 03:42 (hdb),
sector 11235856

Configuration is the following:
Tyan 2466 motherboard which has AMD760MPX chipset, dual Athlon MP2000+
processors (supports UltraATA100)

hda=Seagate ST340016A 40 GB drive, ext2 FS
hdb=Seagate ST380021A 80 GB drive, ext2 FS.

There are many entries in this mailing list saying that
the above error is a sign of a bad disk. Seagate diagnostics
say so too.. It is just hard to believe that 30 hard drives could
go bad in less than a month.

I know errors of this type were common on machines with Serverworks
OSB4 chipsets. Has anyone else heard of this error happening on
non-serverworks chipsets such as VIA or AMD? And is the drive
really bad or will a low level format clear the bad blocks
and let the drive operate again?

Steve Timm

------------------------------------------------------------------

SMART shows the following error structure:

SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 03
ATA Error Count: 13
Non-Fatal Count: 0

Error Log Structure 1:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 57 09 ab f2 c8 40315
00 00 08 5f 09 ab f2 c8 40315
00 00 08 67 09 ab f2 c8 40315
00 00 08 6f 09 ab f2 c8 40315
00 00 08 77 09 ab f2 c8 40315
00 40 00 7d 09 ab f2 51 922746
Error condition: 33 Error State: 3
Number of Hours in Drive Life: 1021 (life of the drive in hours)

Error Log Structure 2:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 07 d5 55 f1 ca 40320
00 00 08 3f 00 5c f1 ca 40320
00 00 08 97 33 5d f1 ca 40320
00 00 08 87 97 0f f2 ca 40320
00 00 08 77 09 ab f2 c8 40320
00 40 00 7d 09 ab f2 51 922746
Error condition: 33 Error State: 3
Number of Hours in Drive Life: 1021 (life of the drive in hours)

Error Log Structure 3:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 28 bf 8f 52 f1 c8 23662
00 00 98 e7 8f 52 f1 c8 23662
00 00 68 ff 9a 52 f1 c8 23662
00 00 d8 67 9b 52 f1 c8 23662
00 00 28 07 a3 52 f1 c8 23662
00 40 00 25 a3 52 f1 51 1124073
Error condition: 161 Error State: 3
Number of Hours in Drive Life: 1040 (life of the drive in hours)

Error Log Structure 4:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 e0 4f 09 ab f2 c8 40280
00 00 d8 57 09 ab f2 c8 40285
00 00 d0 5f 09 ab f2 c8 40290
00 00 c8 67 09 ab f2 c8 40296
00 00 c0 6f 09 ab f2 c8 40301
00 40 00 7d 09 ab f2 51 922746
Error condition: 33 Error State: 3
Number of Hours in Drive Life: 1021 (life of the drive in hours)

Error Log Structure 5:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 d8 57 09 ab f2 c8 40285
00 00 d0 5f 09 ab f2 c8 40290
00 00 c8 67 09 ab f2 c8 40296
00 00 c0 6f 09 ab f2 c8 40301
00 00 b8 77 09 ab f2 c8 40306
00 40 00 7d 09 ab f2 51 922746
Error condition: 33 Error State: 3
Number of Hours in Drive Life: 1021 (life of the drive in hours)

Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David Woodhouse: "Re: [RFC/CFT] Separate obj/src dir"
Previous message: Con Kolivas: "[BENCHMARK] 2.5.48-mm1 with contest"
Next in thread: Alan Cox: "Re: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Reply: Alan Cox: "Re: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Maybe reply: Manish Lachwani: "RE: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Nov 23 2002 - 22:00:29 EST