Re: [PATCH] writeback: permit through good bdi even when globaldirty exceeded

From: Wu Fengguang
Date: Fri Dec 02 2011 - 03:29:53 EST

Next message: Christoph Hellwig: "Re: [RFC PATCH] ext4: auto batched discard support at kernel thread"
Previous message: Linus Walleij: "Re: [PATCH 2/3] clocksource: dbx500: convert to clocksource_register_hz()"
In reply to: Andrew Morton: "Re: [PATCH] writeback: permit through good bdi even when globaldirty exceeded"
Next in thread: Wu Fengguang: "Re: [PATCH] writeback: permit through good bdi even when globaldirty exceeded"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Dec 02, 2011 at 03:03:59PM +0800, Andrew Morton wrote:
> On Fri, 2 Dec 2011 14:36:03 +0800 Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
>
> > --- linux-next.orig/mm/page-writeback.c 2011-12-02 10:16:21.000000000 +0800
> > +++ linux-next/mm/page-writeback.c 2011-12-02 14:28:44.000000000 +0800
> > @@ -1182,6 +1182,14 @@ pause:
> > if (task_ratelimit)
> > break;
> >
> > + /*
> > + * In the case of an unresponding NFS server and the NFS dirty
> > + * pages exceeds dirty_thresh, give the other good bdi's a pipe
> > + * to go through, so that tasks on them still remain responsive.
> > + */
> > + if (bdi_dirty < 8)
> > + break;
>
> What happens if the local disk has nine dirty pages?

The 9 dirty pages will be cleaned by the flusher (likely in one shot),
so after a while the dirtier task can dirty 8 pages more. This
consumer-producer work flow can keep going on as long as the magic
number chosen is >= 1.

> Also: please, no more magic numbers. We have too many in there already.

Good point. Let's add some comment on the number chosen?

> What to do instead? Perhaps arrange for devices which can block in
> this fashion to be identified as such in their backing_device and then
> prevent the kernel from ever permitting such devices to fully consume
> the dirty-page pool.

Yeah, that's considered too, unfortunately it's not as simple and
elegant than the proposed patch. For example, if giving all NFS mounts
the same "lowered" limit, there is still the problem that when one NFS
mount goes broken, the other NFS mounts are all impacted.

> If someone later comes along and decreases the dirty limits mid-flight,
> I guess the same problem occurs. This can perhaps be handled by not
> permitting to limit to be set that low at that time.

Yes! Not long ago we introduced @global_dirty_limit and
update_dirty_limit() exactly for fixing that case. The comment says:

/*
* The global dirtyable memory and dirty threshold could be suddenly knocked
* down by a large amount (eg. on the startup of KVM in a swapless system).
* This may throw the system into deep dirty exceeded state and throttle
* heavy/light dirtiers alike. To retain good responsiveness, maintain
* global_dirty_limit for tracking slowly down to the knocked down dirty
* threshold.
*/
static void update_dirty_limit(unsigned long thresh, unsigned long dirty)
{
...

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christoph Hellwig: "Re: [RFC PATCH] ext4: auto batched discard support at kernel thread"
Previous message: Linus Walleij: "Re: [PATCH 2/3] clocksource: dbx500: convert to clocksource_register_hz()"
In reply to: Andrew Morton: "Re: [PATCH] writeback: permit through good bdi even when globaldirty exceeded"
Next in thread: Wu Fengguang: "Re: [PATCH] writeback: permit through good bdi even when globaldirty exceeded"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]