Re: RAID extremely slow

From: David Dillow
Date: Thu Jul 26 2012 - 22:58:10 EST


On Wed, 2012-07-25 at 18:55 -0700, Kevin Ross wrote:
> On 07/25/2012 06:00 PM, Phil Turmel wrote:
> > Piles of small reads scattered across multiple drives, and a
> > concentration of queued writes to /dev/sda. What's on /dev/sda?
> > It's not a member of the raid, so it must be some other system task
> > involved.

> After rebooting, MythTV is currently recording two shows, and the resync
> is running at full speed.
>
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sdh1[0] sdd1[9] sde1[10] sdb1[6] sdi1[7] sdc1[4]
> sdf1[3] sdg1[8] sdj1[1]
> 6837311488 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [9/9] [UUUUUUUUU]
> [=>...................] resync = 9.3% (91363840/976758784)
> finish=1434.3min speed=10287K/sec
>
> unused devices: <none>
>
> atop shows the avio of all the drives to be less than 1ms, where before
> they were much higher. It will run for a couple days under load just
> fine, and then it will come to a halt.
>
> It's a 3.2.21 kernel. I'm running Debian Testing, and the exact Debian
> package version is:

I suspect you are being hit by same bug I was -- delayed stripes never
got processed. If you get into the state where the rebuild isn't
progressing, and you find that increasing the size of the stripe cache
allows the rebuild to proceed (but the filesystem stays wedged), then
that cinches it.

If you can, upgrade to the latest 3.4 stable kernel (3.4.6 right now).
As far as I can see, the latest 3.2 stable does not contain the delayed
stripe fix.

After applying those fixes to my kernel, my MythTV setup over a 5 disk
RAID5 has been pretty solid, where before I was getting lockups every
few days. It still seems to be getting slower over time, but I've not
looked into it yet as it is not as catastrophic as the wedging.

HTH,
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/