Hi Buddy,
Even for systems that don't *need* the extra memory space, swap can
actually provide performance improvements by allowing unused memory
to be replaced with often-used memory.
For example, I have 57MB swapped right now. It allows me to instantly
grep the kernel tree. If I turned swap off, each grep would probably
take 30 seconds.
Your analogy is flawed. There are many reasons why this doesn't work in the
real world.
Your grep analogy incorrectly assumes that you have a bunch of vacant memory
just waiting to store those filesystem pages, but that simply isn't the
case. Rather 57MB of anonymous memory was evicted to make room for 57MB of
anonymous or file system backed pages. Unless you have freed anonymous
memory on the system by closing applications. Your physical memory pages are
still mostly occupied.
This means your grep is only going to run faster if you already read those
files recently and they are already in the pagecache. You still have the
burdon of pushing pages that have not been used recently out of ram before
you can read in the new ones. And as long as you are performing a sufficient
amount of file system I/O, this is guaranteed to happen.
One thing that can be done to minimize the problem where heavy filesystem
I/O flushes important pages from memory like pages from shared libraries and
executables only for them to fault back in as soon as they become runnable,
is to implement something similar to what Sun implemented in Solaris 8
called the cyclical page cache. The idea is that the pagecache pages against
itself and is actually considered free memory from an anonymous memory
perspective. The pagecache is free to grow all it wants, but since it is
counted as free memory, anonymous memory allocation will cause the pagecache
to shrink because it is considered free memory.
As these pages are evicted from the pagecache, they are placed on the
opposite side of the cachelist (linked list that stores pages that have a
vnode+offset already) than the side where pages are being overwritten. This
way frequently re-accessed pages that were placed on the cache list and were
eligible to be reclaimed, are found when the next minor fault occurs for
that vnode+offset and moved back to the opposite side of the list so that
they are not evicted.
Since the cache list is counted as free memory, there is no way to wake up
the LRU mechanism to scan physical memory until 1/64 of physical memory is
consumed by anonymous memory.