Re: Defective Disk not reported as Dead

From: Oktay Akbal (oktay.akbal@s-tec.de)
Date: Sat May 24 2003 - 07:43:16 EST


Hi Denis,

you should understand, that I really try not to reproduce the Problem,
since this is a VERY productive Database- and Fileserver.

So am seeking hints on known Bugs or just some hints.

For the Kernel-traces I assume that I would need some messages.
But for the kernel everything seems to be normal. At least there are
no messages in syslog.

Oktay

On Fri, 23 May 2003, Denis Vlasenko wrote:

> On 23 May 2003 10:23, Oktay Akbal wrote:
> > Hello !
> >
> > We do have some strange problem here.
> > A Server with some qlogic qla2002f-Adapter (2-channel)is connected to
> > 2 external Raid-Arrays via multipathing and raid1 on top of it. But
> > the multipathing and raid should not be the problem here.
> >
> > The Raid-Arrays itself are Fibre-to-Ide and present themselfs as 1
> > disks each. Now for the problem: Due to a firmware bug the raid-boxes
> > sometimes seems to loose the ability to write (and read i think) to
> > their internal disks. The effect is, that the cache fills up but does
> > not get flushed to disks. When full, the box is in a strange state.
> > It gets detected when reloading the kernel-modules but linux can no
> > longer access the disks. when accessing the partition all processes
> > on that disk hang (like ls or ps -efa etc.)
>
> This sounds like bug. Can you determine where exactly those processes
> are nailed? I bet you'll see them in D state, but folks will need
> more details, more precisely kernel stack backtraces.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/