Am Sonntag, 30. März 2008 schrieb Tejun Heo:Hello,
Hans-Peter Jansen wrote:Different vendors use different scales for the raw values. The value isHere's the last smart report from two of the offending drives. As notedCan you please post the result of "smartctl -a /dev/sdX"?Should I be worried? smartd doesn't show anything suspicious on
those.
before, I did the hardware reorganization, replaced the dog slow 3ware
9500S-8 and the SiI 3124 with a single Areca 1130 and retired the
drives for now, but a nephew already showed interest. What do you
think, can I cede those drives with a clear conscience? The
Hardware_ECC_Recovered values are really worrisome, aren't they?
still pegged at the highest so it could be those raw values are okay or
that the vendor just doesn't update value field accordingly. My P120
says 0 for the raw value and 904635 for hardware ECC recovered so there
is some difference. What do other non-failing drives say about those
values?
The only non-failing drive was sdf as it was running in standby mode in this md raid 5 ensemble:
20080323-011337-sdc.log:195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 162956700
20080323-011337-sdc.log:196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
20080323-011337-sdc.log:197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
20080323-011337-sdc.log:198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
20080323-011337-sdc.log:199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
20080323-011338-sdd.log:195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 162520674
20080323-011338-sdd.log:196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
20080323-011338-sdd.log:197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
20080323-011338-sdd.log:198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
20080323-011338-sdd.log:199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
20080323-011338-sde.log:195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 148429049
20080323-011338-sde.log:196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
20080323-011338-sde.log:197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
20080323-011338-sde.log:198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
20080323-011338-sde.log:199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
20080323-011339-sdf.log:195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1559
20080323-011339-sdf.log:196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
20080323-011339-sdf.log:197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
20080323-011339-sdf.log:198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
20080323-011339-sdf.log:199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
Hmmm... If the drive is failing FLUSHs, I would expect to see elevated
reallocation counters and maybe some pending counts. Aieee.. weird.
But there are no reallocations nor any pending sectors on any of them.
^^^^It should have appeared as read errors. Maybe the drive successfullyHmm, I didn't noticed any data distortions, and if there where, theyFLUSH_EXT timing out usually indicates that the drive is havingIt's been 4 samsung drives at all hanging on a sata sil 3124:
problem writing out what it has in its cache to the media. There was
one case where FLUSH_EXT timeout was caused by the driver failing to
switch controller back from NCQ mode before issuing FLUSH_EXT but that
was on sata_nv. There hasn't been any similar problem on sata_sil24.
live on as copies in their new home..
write (I guess)
wrote those sectors after 30+ secs timeout.
That would point to some driver issue, wouldn't it? Roger Heflin also
experienced similar behavior with that controller, which wasn't reproducible with another.
I can offer to you rebuilding that md in a test environment, and giving you access to it, if you're interested.
Anyway, thanks for caring Tejun,
Pete