[PATCH] vmscan.c bugfix / vm idea's (fwd)

Rik van Riel (H.H.vanRiel@fys.ruu.nl)
Mon, 10 Nov 1997 01:30:19 +0100 (MET)


Hello all,

I have recently run into (another) vmscan.c bug...
It happens every time the system is low on memory, and big
chunks are being allocated at once. The bug happens because
kswapd may have swapped out some pages asynchrounous <sp??>
but the pages haven't been freed yet.

When, together with this, some programs are allocating loads
of memory, you might end up with the situation in which the
number of async pages is large (no problem with that, it reduces
disk-seek times) and the nr_free_pages is low (that is a problem,
of course).

Adding these two number is right in some cases, but when
nr_free_pages is below free_pages_low it clearly isn't...

The following patch alleviates this problem.

-------------cut here-------------
--- vmscan.c.2160 Tue Nov 4 12:14:47 1997
+++ vmscan.c Tue Nov 4 15:49:27 1997
@@ -489,7 +489,7 @@
int want_wakeup = 0, memory_low = 0;
int pages = nr_free_pages + atomic_read(&nr_async_pages);

- if (pages < free_pages_low)
+ if (nr_free_pages < free_pages_low)
memory_low = want_wakeup = 1;
else if (pages < free_pages_high && jiffies >= next_swap_jiffies)
want_wakeup = 1;
--------------cut here-----------

Another large kswapd 'bug' is that it doesn't do aging on mmapped
files (see shrink_mmap) and that it uses too much CPU time
(try_to_free_page IS run under a kernel lock!!).
I fixed those two things this spring for version 2.1.42, but
unfortunately this patch:
- depended on the 2.1-mem-mgt memory wait-queue patch (a good
piece of code, but it sometimes broke under _extreme_ load)
- got erased when I upgraded after the summer holydays :-(

It basically consisted of 3 things:
- implementation of balancing buffer and cache memory: if
more than x% of memory is buffer/cache, we take that memory
first, if less than y% of memory is buffer/cache, we leave
it alone. (50% and 10% seemed to be good default values...)
- a new kernel thread, vhand, which used the linear table to
(efficiently!) scan and age memory. It also tuned itself
to make sure that a 'good' fraction of the pages had age 0.
(tunable by sysctl)
- shrink_mmap in mmap.c also used page-aging... this gave
quite some improvement, especially under higher loads.

... I haven't taken a look at shared memory yet, and I
don't know what effect the usage of aging could have
on shm-performance. (since these pages are used by
more programs, we should look even more carefully at
these???)
... We could improve fairness (and performance) by taking
pages from programs on a faults/megabyte or other ratio,
currently we just take pages from whatever program we
happen to at. (would this be worth the effort / cost)
... Would making an inactive list (pages to swap out next),
so kswapd doesn't have to do it's inefficient scanning, be
worth it? (probably not?)
... Would implementing RSS_LIMIT for users/programs be
good for system performance?? Freeing up memory for (small,
interactive) programs of other users might be good, but not
if we thrash the I/O system by continuously swapping some
big (background) number crunching application.
... KEYWORD: balancing. which is worth it, which isn't?
other keywords: small kernel, fast kernel.

I'm now asking all of you what features should be implemented
first. Also, the wait-queue for memory was a very good piece
of code. If someone would care porting it to the newer kernels,
we'd all live in a happier world.

Rik.

ps. the From: header doesn't yet correct, I will have to get
sendmail working first :-)
pps. if you think this posting should be forwarded to the
kernel list or to other core-mm people, please do so.

----------
small sig.