Re: Updated on UDMA BadCRC errors + subsequent problems (was: Is it safe to ignore UDMA BadCRC errors?)

From: John Bradford
Date: Fri Jan 16 2004 - 11:41:52 EST


Quote from Jonathan Kamens <jik@xxxxxxxxxxxxxxxxxxxxxx>:
> John Bradford writes:
>
> > Maybe not - the most common cause I've seen for that message in the
> > logs is trying to access S.M.A.R.T. information when S.M.A.R.T. is
> > disabled.
> >
> > I.E. the error should be reproducable with:
> >
> > # smartctl -d /dev/hda
> > # smartctl -a /dev/hda
> >
> > Are you sure you weren't trying to get S.M.A.R.T. info from the
> > drive at the time the error was logged?
>
> My smartctl wants "-s off" rather than "-d", but other than that,
> you're correct, that sequence of commands does ause the same error to
> appear in the logs. But why/how would SMART be disabled on the drive?
> I've been running smartd on the drive for weeks with no errors of this
> sort, and I fail to see how SMART would suddenly be disabled on the
> drive with no action on my part,

Some motherboard BIOSes disable S.M.A.R.T. on drives connected to
their on-board controllers on each boot. Quite possibly some PCI IDE
cards do as well. It's possible, (but probably not likely), that by
trying the drive on different controllers a BIOS somewhere has
disabled S.M.A.R.T.

> so it seems more likely that some
> other condition caused the error.

Quite possibly, but I can only really guess as to what that might be
at this point.

I _think_ that UDMA CRC checking is only done on data transfers, not
commands. I've CC'ed Alan in the hope of getting some confirmation on
this. Maybe a command being corrupted on the wire could theoretically
cause that error.

John.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/