Re: 2.4.20: Proccess stuck in __lock_page ...

From: Jens Axboe (axboe@suse.de)
Date: Wed May 28 2003 - 02:18:47 EST


On Wed, May 28 2003, Con Kolivas wrote:
> On Wed, 28 May 2003 16:04, Jens Axboe wrote:
> > On Wed, May 28 2003, Con Kolivas wrote:
> > > On Wed, 28 May 2003 04:04, Marc-Christian Petersen wrote:
> > > > On Tuesday 27 May 2003 19:50, manish wrote:
> > > >
> > > > Hi Manish,
> > > >
> > > > > It is not a system hang but the processes hang showing the same stack
> > > > > trace. This is certainly not a pause since the bonnie processes that
> > > > > were hung (or deadlocked) never completed after several hrs. The
> > > > > stack trace was the same.
> > > >
> > > > then you are hitting a different bug or a bug related to the issues
> > > > Christian Klose and me and $tons of others were complaining.
> > > >
> > > > The bug you are hitting might be the problem with "process stuck in D
> > > > state" Andrea Arcangeli fixed, let me guess, over half a year ago or
> > > > so.
> > > >
> > > > In case you have a good mind to try to address your issue, you might
> > > > want to try out the patch you can find here:
> > > >
> > > > http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.2
> > > >1rc2 aa1/9980_fix-pausing-2
> > > >
> > > > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is
> > > > dead/: speak _NOW_ please, doesn't matter who you are!
> > >
> > > Yo!
> > >
> > > I'll throw my babushka into the ring too. I think it's obvious from MCP's
> > > comments that I've been involved in testing this problem. I've spent
> > > hours, possibly days trying to find a way to fix the pauses introduced
> > > since 2.4.19pre1. I agree with what MCP describes that the machine can
> > > come to a standstill under any sort of disk i/o and is unusable for a
> > > variable length of time. I've been playing with all sorts of numbers in
> > > my patchset to try and limit it with only mild success. The best results
> > > I've had without a major decrease in throughput was using akpm's read
> > > latency 2 patch but by significantly reducing the nr_requests. It was
> > > changing the number of requests that I discovered dropping them to 4
> > > fixed the problem but destroyed write throughput. I was pleased to see AA
> > > give the problem recognition after my contest results on his kernel but
> > > disappointed that the problem only was reduced, not fixed.
> >
> > Does the problem change at all if you force batch_requests to 0?
>
> I've tried batch_requests to 1 by itself (without changing the
> nr_request) and that didn't fix it, but recall dropping nr_requests to
> 2 (which would make batch requests==0) made the machine fail to boot
> so I haven't tried batch requests 0 by itself. Should it boot with it
> == 0?

If you leave nr_requests as it is, I don't see why it should not boot
with batch_requests == 0.

I can't see in all of these mails whether backing out akpm's starvation
patch makes the problem go away. Does it?

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/