Hmmm ... the current code (2.0/2.1) has some design problems. On one hand you
can use your own try_to_free_page() with the your state fix ... and you run
in deep trouble on a higher load than 1 due to exponential growing swap I/O.
Any system running this becomes unusable if the load goes higher than 1 with
a memory consuming job.
On the other hand direct swapping for buffer/cache is complicated. If one does
this (e.g applying David's Millers buffer/swapping patch _without_ the state
fix) there _must_ be a limit in try_to_free_page() for stopping the
intensity/deep of freeing a page (e.g setting `stop' to 2 for priority ==
GFP_BUFFER or something similar). The reason is very simple: process/shared
pages should have a small precedence over buffer/cache pages or in some
cases the swap I/O grows over any (software) limit.
Note: If one combines your state fix in try_to_free_page() with the
buffer/swapping of David any system becomes unusable by simply running
and using the system. This is the conclusion out of the early patch attempts
I've done.
kswapd: What I'm missing is a fair and global usage score system or something
similar for _virtual_ pages or cluster of virtual pages. The current age on
demand system only ages physical pages which are reached until
try_to_free_page() frees a page. This is really a fast solution for a mostly
idle system ... but unfair, slow, and risky on high stress. With a usage
score system, or in other words, a system which counts page usages between
one or more kswapd wakeup's and a try_to_free_page which directly swaps out
the pages with the lowest usage count in comparison, the swap I/O would be
minimised to the necessary amount. The importance sequence of cache/buffer,
(dentry,) shared, and process pages is clearly useful too.
>
> What to do? I'm playing around with various tuning parameters in
> /proc/sys/vm/*, but no luck so far. Any help appreciated.
>
> NB: Of the RAM in this machine, admittedly, half (RSS output of ps) is
> taken up by squid and half of the rest is taken by INN. On a whim, I
> SIGSTOPped the cache server. Five minutes later, squid still had a constant
> RSS of 42 MBytes and the first annoyed users were calling in, so I had to
> continue it. :-/ vmstat, too, shows almost no swap activity.
>
... it's trivial but if two processes have more than the half physical
memory over the most time and the system performance needs ram ...
it's a `Binsenweisheit'.
Werner