Re: IO errors after some uptime on 2.6.17.7

From: Joseph Pingenot
Date: Sun Aug 06 2006 - 11:30:06 EST


>From Chuck Ebbert on Sunday, 06 August, 2006:
>In-Reply-To: <20060806041700.GA5157@xxxxxxxxxxxxxx>
>On Sat, 5 Aug 2006 23:17:01 -0500, Joseph Pingenot wrote:
>> I've now seen two boxes with 2.6.17.7 go down with IO errors after some
>> time running. There doesn't seem to necessarily be a fixed amount of
>> time, nor can I precisely figure out what's causing it.
>How do you know they were IO errors?

That's what everything was saying, like when I tried to run any program.
Eventually, things that were running stayed running (e.g. I have irssi
on one box showing symptoms and can still chat, but opening up a
new window in screen fails:

Could not write /var/run/utmp: No such process
cannot exec '/bin/bash': Input/output error

On reboot everything was fixed. And the SMART diagnostic on a drive:
# 1 Extended offline Completed without error 00% 12221 -

Some relevant SMART data:
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

I'll send the .config from the other downed box when I can get to it
to reboot it.

>Did previous versions of 2.6.17 work?

It seemed to, but I'm far from certain.

>Please try to get debug output with netconsole if you can.

That will be very tough atm. I'll see what I can do, though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/