Re: Very aggressive swapping after 2 hours rest

From: Andrea Arcangeli (andrea@suse.de)
Date: Wed Sep 20 2000 - 11:34:05 EST

Next message: Chris Meadors: "Re: Question about signals"
Previous message: Dr. Kelsey Hudson: "Re: Linux-2.4.0-test9-pre2"
In reply to: Rik van Riel: "Re: Very aggressive swapping after 2 hours rest"
Next in thread: Rik van Riel: "Re: Very aggressive swapping after 2 hours rest"
Reply: Rik van Riel: "Re: Very aggressive swapping after 2 hours rest"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Sep 20, 2000 at 12:53:18PM -0300, Rik van Riel wrote:
> On Tue, 19 Sep 2000, Andrea Arcangeli wrote:
> > On Tue, Sep 19, 2000 at 05:29:29AM -0300, Rik van Riel wrote:
> > > what I wanted to do in the new VM, except that I didn't
> > > see why we would want to restrict it to swap pages only?
> >
> > You _must_ do that _only_ for the swap_cache, that's a specific
> > issue of the swap_cache during swapout (notenote: not during
> > swapin!).
>
> Which part of "why" did you not understand?
>
> I see no reason why we should not do the same trick for
> mmap()ed pages and maybe other memory too...

It's quite obvious we not talking about the same thing (and it's also quite
obvious the problem isn't addressed in test9), I'll restart from scratch trying
to be more clear.

There are two cases:

1) pageins
2) pageouts

When we get a major page fault (pagein/swapin) and we create a swap cache page
or a page-cache page, we must consider it a _more_ important page to keep in
the working set. So it's fine to put it at the head->next of the LRU and to
age it properly.

So far so good.

Now there's a very special (subtle) case that I addressed in classzone
and that is only related to the swapout of a swap cache (well, strictly
speaking the pageout of shared pages could take advantage of it as well
but I didn't wrote a mechamism generic enough to do that for MAP_SHARED as well
yet and that's much less important because the dirty page cache is just in
the LRU and it have less chances to be in the lru_cache->next position).

When we choose to swapout a page that's the less interesting page we have
in the VM. If that wouldn't be true then our selection algorithm that chooses
which page to swapout would be flawed.

Ok, so now we have a page that we want to throw away by swapping it out.

What we do now? We put it in the lru_cache->next position and we write
it to disk. Then we left it in the lru_cache waiting shrink_mmap to free it.

What happens now? Before shrink_mmap will have a chance to free this page,
shrink_mmap will have to first throw away all the working set that we have in
cache from lru_cache->next->next to lru_cache->prev.

This is an obvious design bug that we have since 2.2.x and that degenerated
with the proper LRU in 2.4.x. The bug is that to free 1 page with the swapout
mechanism we first have to throw away all the working set in cache.

In classzone I have a proper lru_swap_cache where those swapped out pages
are put, and I always free those pages immediatly as soon as they're unlocked.
This avoids to throw away the working set for each single swapout.

One thing I will do to decrease the CPU usage in classzone is to make this
lru_swap_cache LRU a IRQ spinlock LRU, so that I can move the pages into this
lru_swap_cache only once the I/O is completed. As said this is a further
optimization not strictly necessary to save the working set of the kernel.

Aggressive aging can alleviate the problem indeed.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Chris Meadors: "Re: Question about signals"
Previous message: Dr. Kelsey Hudson: "Re: Linux-2.4.0-test9-pre2"
In reply to: Rik van Riel: "Re: Very aggressive swapping after 2 hours rest"
Next in thread: Rik van Riel: "Re: Very aggressive swapping after 2 hours rest"
Reply: Rik van Riel: "Re: Very aggressive swapping after 2 hours rest"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:23 EST