Re: [PATCH 10/12] mm: page_alloc_wait

From: Peter Zijlstra
Date: Fri Apr 06 2007 - 02:38:06 EST


On Thu, 2007-04-05 at 15:57 -0700, Andrew Morton wrote:
> On Thu, 05 Apr 2007 19:42:19 +0200
> root@xxxxxxxxxxxxxxxxxxxxxxxxx wrote:
>
> > Introduce a mechanism to wait on free memory.
> >
> > Currently congestion_wait() is abused to do this.
>
> Such a very small explanation for such a terrifying change.

Yes, I suck at writing changelogs, bad me. Normally I would take a day
to write them, but I just wanted to get this code out there. Perhaps a
bad decision.

> > ...
> >
> > --- linux-2.6-mm.orig/mm/vmscan.c 2007-04-05 16:29:46.000000000 +0200
> > +++ linux-2.6-mm/mm/vmscan.c 2007-04-05 16:29:49.000000000 +0200
> > @@ -1436,6 +1436,7 @@ static int kswapd(void *p)
> > finish_wait(&pgdat->kswapd_wait, &wait);
> >
> > balance_pgdat(pgdat, order);
> > + page_alloc_ok();
> > }
> > return 0;
> > }
>
> For a start, we don't know that kswapd freed pages which are in a suitable
> zone. And we don't know that kswapd freed pages which are in a suitable
> cpuset.
>
> congestion_wait() is similarly ignorant of the suitability of the pages,
> but the whole idea behind congestion_wait is that it will throttle page
> allocators to some speed which is proportional to the speed at which the IO
> systems can retire writes - view it as a variable-speed polling operation,
> in which the polling frequency goes up when the IO system gets faster.
> This patch changes that philosophy fundamentally. That's worth more than a
> 2-line changelog.
>
> Also, there might be situations in which kswapd gets stuck in some dark
> corner. Perhaps the process which is waiting in the page allocator holds
> filesystem locks which kswapd is blocked on. Or kswapd might be blocked on
> a particular request queue, or a dead NFS server or something. The timeout
> will save us, but things will be slow.
>
> There could be other problems too, dunno - this stuff is tricky. Why are
> you changing it, what problems are being solved, etc?

Lets start with the why, because of 12/12; I wanted to introduce per BDI
congestion feedback, and hence needed a BDI context for
congestion_wait(). These specific callers weren't in the context of a
BDI but of a more global idea.

Perhaps I could call page_alloc_ok() from bdi_congestion_end()
irrespective of the actual BDI uncongested? That would more or less give
the old semantics.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/