Re: why swap at all?

From: Bill Davidsen
Date: Fri Jun 04 2004 - 12:13:12 EST


Buddy Lumpkin wrote:
But swap behaviour kills performance even when memory is more than adequate. Consider building a DVD image in a 4GB system. The i/o forces all of the unused programs out, in spite of the fact that an extra 100MB doesn't make a measurable difference in performance. But when I click Mozilla paging most of it in from disk make a big difference in performance to the user.



We really need a server option. Something that ages out file backed pages
naturally with less overhead than kswapd. One method would be to keep the
pagecache on it's own list, and move pages to the head of the list any time
they are modified or referenced, and reclaim from the tail.

All pages on this list can be considered as "free memory", because any new
memory requests would just cause pages to be evicted from the tail of the
list.

Anonymous memory would *not* be on this list. This way any time anonymous
memory is allocated, the pages can be readily stolen from the pagecache
list.

Lastly one nifty configuration parameter that could exist as a knob for
sys-admins is the ability to tell the VM not to add file backed pages with
the execute bit set to the page cache list but rather, leave them to be
reclaimed if kswapd wakes up in a true low memory situation (pagecache is
exhausted and memory is still low). This would require a sys-admin to make
sure only executables have the execute bit set and "data files", etc... do
not have the execute bit set.

Or have the exec() call set a "part of a process" flag. That means that if I read an executable in as data it doesn't get locked, other than what part might be in my i/o buffers. And mmap can produce different effects than read/write which may be good, if they are GOOD different effects ;-) Before you ask, thing 'strings' as why avg user does this.

But I fail to make my point... I want to limit how much memory is used for i/o buffers, cache, or anything else which will produce memory pressure of my programs. The quick solution might be just a number from the admin, like the 2.2 patch, but some kernel logic to understand that while 20MB is much better than 10MB in a tiny system, 2GB is not a lot better than 1GB in a large memory system, and having a sync() bog the system for tens of seconds is undesirable. Well, maybe some folks don't agree, it could be that the admin set limit is really the way to go.

I regard this as a desktop issue, trading some i/o performance to keep window changes fast.


A system that works like this is nice for the following reasons:

1) The system administrator can size a system so that all programs
Safely run within physical RAM. Extra RAM
Could be added and sized based on the need
for caching files.

2) Anonymous pages (and possibly executable if you read the last paragraph above) will only be evicted if kswapd is
awaken due to a true memory shortage (1/128th pagable memory?).


I like to view the VM system as always being full, because if enough unique
file system IO takes place, that is exactly what eventually happens. A
system that counts page cache as free memory and uses a gentler mechanism to
evict pages from the page cache would benefit IO bound servers significantly
IMHO.

That's what would be nice with tuning, the admin can optimize what is important on that system. I am usually happy with what the system does on i/o, but I want my 500MB or so of programs to stay resident in a 2GB machine, and if that adds a ms or two to i/o I can live with it, so that when I change windows it happens now, not eventually. And I bet there are a lot of others who would like better response to focus changes aswell.

--
-bill davidsen (davidsen@xxxxxxx)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/