Re: MD/RAID time out writing superblock

From: Robert Hancock
Date: Sun Sep 20 2009 - 14:46:40 EST


On 09/17/2009 09:44 AM, Tejun Heo wrote:
Thanks Neil. This implies that when we see these fifteen second
hangs reading /proc/mdstat without write errors, there are genuinely
successful superblock writes which are taking fifteen seconds to
complete, presumably corresponding to flushes which complete but
take a full 15s to do so.

Would such very slow (but ultimately successful) flushes be
consistent with the theory of power supply issues affecting the
drives? It feels like the 30s timeouts on flush could be just a more
severe version of the 15s very slow flushes.

Probably not. Power problems usually don't resolve themselves with
longer timeout. If the drive genuinely takes longer than 30s to
flush, it would be very interesting tho. That's something people have
been worrying about but hasn't materialized yet. The timeout is
controlled by SD_TIMEOUT in drivers/scsi/sd.h. You might want to bump
it up to, say, 60s and see whether anything changes.

It's possible if the power dip only slightly disrupted the drive it might just take longer to complete the write. I've also seen reports of vibration issues causing problems in RAID arrays (there's a video on Youtube of a guy yelling at a Sun disk array during heavy I/O and the resulting vibrations causing an immediate spike in I/O service times). Could be something like that causing issues with simultaneous media access to all drives in the array, too..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/