Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

From: Neil Brown
Date: Wed Jan 16 2008 - 16:54:53 EST


On Tuesday January 15, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
> On Wed, 16 Jan 2008 00:09:31 -0700 "Dan Williams" <dan.j.williams@xxxxxxxxx> wrote:
>
> > > heheh.
> > >
> > > it's really easy to reproduce the hang without the patch -- i could
> > > hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
> > > i'll try with ext3... Dan's experiences suggest it won't happen with ext3
> > > (or is even more rare), which would explain why this has is overall a
> > > rare problem.
> > >
> >
> > Hmmm... how rare?
> >
> > http://marc.info/?l=linux-kernel&m=119461747005776&w=2
> >
> > There is nothing specific that prevents other filesystems from hitting
> > it, perhaps XFS is just better at submitting large i/o's. -stable
> > should get some kind of treatment. I'll take altered performance over
> > a hung system.
>
> We can always target 2.6.25-rc1 then 2.6.24.1 if Neil is still feeling
> wimpy.

I am feeling wimpy. There've been a few too many raid5 breakages
recently and it is very hard to really judge the performance impact of
this change. I even have a small uncertainty of correctness - could
it still hang in some other way? I don't think so, but this is
complex code...

If it were really common I would have expected more noise on the
mailing list. Sure, there has been some, but not much. However maybe
people are searching the archives and finding the "increase stripe
cache size" trick, and not reporting anything .... seems unlikely
though.

How about we queue it for 2.6.25-rc1 and then about when -rc2 comes
out, we queue it for 2.6.24.y? Any one (or any distro) that really
needs it can of course grab the patch them selves...

??

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/