Re: [PATCH] 2.6.4-rc1-mm1: vm-kswapd-incremental-min (was Re: MMVM patches was: 2.6.3-mm4)

From: Nick Piggin
Date: Mon Mar 01 2004 - 05:32:27 EST




Andrew Morton wrote:

Nick Piggin <piggin@xxxxxxxxxxxxxxx> wrote:

Andrew, I think you had kswapd scanning in the direction opposite the
one indicated by your comments. Or maybe I've just confused myself?



Nope, the node_zones[] array is indexed by



OK. Mike, could you try this patch instead after testing rc1-mm1.
Thanks.

linux-2.6-npiggin/mm/vmscan.c | 34 +++++++++++++++++++++++-----------
1 files changed, 23 insertions(+), 11 deletions(-)

diff -puN mm/vmscan.c~vm-kswapd-incremental-min mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-kswapd-incremental-min 2004-03-01 20:29:18.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-03-01 21:27:24.000000000 +1100
@@ -889,6 +889,8 @@ out:
return ret;
}

+extern int sysctl_lower_zone_protection;
+
/*
* For kswapd, balance_pgdat() will work across all this node's zones until
* they are all at pages_high.
@@ -907,12 +909,9 @@ out:
* dead and from now on, only perform a short scan. Basically we're polling
* the zone for when the problem goes away.
*
- * kswapd scans the zones in the highmem->normal->dma direction. It skips
- * zones which have free_pages > pages_high, but once a zone is found to have
- * free_pages <= pages_high, we scan that zone and the lower zones regardless
- * of the number of free pages in the lower zones. This interoperates with
- * the page allocator fallback scheme to ensure that aging of pages is balanced
- * across the zones.
+ * balance_pgdat tries to coexist with the INFAMOUS "incremental min" by
+ * trying to free lower zones a bit harder if higher zones are low too.
+ * See mm/page_alloc.c
*/
static int balance_pgdat(pg_data_t *pgdat, int nr_pages, struct page_state *ps)
{
@@ -930,8 +929,10 @@ static int balance_pgdat(pg_data_t *pgda
}

for (priority = DEF_PRIORITY; priority; priority--) {
+ unsigned long min;
int all_zones_ok = 1;
int pages_scanned = 0;
+ min = 0; /* Shut up gcc */

for (i = pgdat->nr_zones - 1; i >= 0; i--) {
struct zone *zone = pgdat->node_zones + i;
@@ -939,15 +940,26 @@ static int balance_pgdat(pg_data_t *pgda
int max_scan;
int reclaimed;

- if (zone->all_unreclaimable && priority != DEF_PRIORITY)
- continue;
-
if (nr_pages == 0) { /* Not software suspend */
- if (zone->free_pages <= zone->pages_high)
- all_zones_ok = 0;
+ /* "incremental min" right here */
if (all_zones_ok)
+ min = zone->pages_high;
+ else
+ min += zone->pages_high;
+
+ if (zone->free_pages <= min)
+ all_zones_ok = 0;
+ else
continue;
+
+ min += zone->pages_high *
+ sysctl_lower_zone_protection;
}
+
+ /* Note: this is checked *after* min is incremented */
+ if (zone->all_unreclaimable && priority != DEF_PRIORITY)
+ continue;
+
zone->temp_priority = priority;
max_scan = zone->nr_inactive >> priority;
reclaimed = shrink_zone(zone, max_scan, GFP_KERNEL,

_