Re: Possible disk failure

From: Steven Rostedt
Date: Wed Nov 14 2012 - 09:39:03 EST


On Tue, 2012-11-13 at 23:07 -0600, Robert Hancock wrote:

> The important part being:
>
> [ 11.974811] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
> [ 11.982816] ata1.00: irq_stat 0x40000008
> [ 11.987512] ata1.00: failed command: READ FPDMA QUEUED
> [ 11.993407] ata1.00: cmd 60/08:00:00:20:92/00:00:07:00:00/40 tag 0
> ncq 4096 in
> [ 11.993407] res 41/40:00:04:20:92/00:00:07:00:00/40 Emask
> 0x409 (media error) <F>
> [ 12.010367] ata1.00: status: { DRDY ERR }
> [ 12.015146] ata1.00: error: { UNC }
>
> ..
>
> [ 16.527065] end_request: I/O error, dev sda, sector 127016964
>
> i.e. the drive reported an uncorrected read error on sector 127016964.
>

> So it looks like the drive reports there's 1 sector that will be
> reallocated once it gets rewritten. It could be that the drive is
> actually OK but that sector just got mis-written (due to a hard
> power-off while it was being written, perhaps) and will be fine once it
> gets written successfully.
>
> You could try using hdparm commands to overwrite that sector, or just
> boot from a live CD, zero out the entire disk with "dd if=/dev/zero
> of=/dev/sda" and try a reinstall. If the drives go away and a long SMART
> self test reports no errors, the drive is likely OK. If not, a
> replacement is likely in order.
>

Ug, I didn't want to reinstall. I've spent way too much time on setting
up this box to start over :-p

Anyway, I booted into a pxe rescue image, and performed a hdparm
--repair-sector on that bad sector, and it worked! It's back up and
running.

Thank you very much!

I'm back off to bitching about systemd and grub2 on this box ;-)

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/