Date: Tue, 11 Apr 2000 17:24:48 -0400
From: "Joseph A. Martin" <jmartin@linux08.mro.dec.com>
- Fragmentation of buddy blocks appears to degrade performance over
the life of the system, but so far I am surprised at how little;
performance remains significantly above no coloring.
Can you give some datapoints which describe why you believe
the fragmentation is "little"? How much ram is in your machine?
I've been working for the 4th time on a coloring page allocator,
so I know what tests stress this characteristic of any change
made to the page allocation in the Linux kernel. It's very simple,
take 2 kernels, 1 with coloring and 1 without. Boot each to single
user mode, immediately perform the following:
bash# cd /usr/wherever/src/linux
bash# cat `find . -type f -name "*.[ch]"` >/dev/null
Then hit "Right-SHIFT+ScrollLCK" and look at the buddy lists.
Here is what I see:
14 3 0 0 1 0 1 1 0 30 : without color
5 4 2 262 38 3 1 0 1 23 : compaq color
39 10 3 2 1 0 0 1 0 30 : color
The lines read from left to right, from order 0 pages onward.
The first line is a stock 2.3.99-pre4 kernel, the next line
is with your coloring patch applied, and the final line is from
the current coloring patch I am working on.
The machine has a 256K L2 cache, an 8K page size, and 256MB of ram.
I believe the level of higher order page fragmentation is problematic.
But before I could come to this conclusion I wished to run the next
level of page fragmentation testing, repeatly performing a full kernel
build (ie. (make mrproper; make oldconfig; make clean; make vmlinux);
perhaps 3 or 4 times) to see what the fragmentation looks like with your
patch after that much activity.
Unfortunately, I could not even boot the machine fully with your
patch. I found the problem though.
The OOPS was at rmqueue() for the first BAD_RANGE(zone,page) test
inside the loop, this clearly indicates buddy list corruption. The
locking in coloring.c looked correct, and besides my test system is a
uniprocessor, but the page tracking of get_named_block seems to be
buggy. Firstly, if the alloc_pages call succeeds, "addr" can never be
zero but it can be that all of the pages obtained are not of the
desired color.
This situation would make the following occur:
1) All pages are added to the reject list, if one of
the alloc_pages fails, we break out of the loop
with page == NULL and addr != 0
(note this also indicates that memory has been
violently fragmented, but the OOPS locked up
my box so I couldn't get a page allocator debugging
dump to be sure)
2) The recursive get_named_block calls are not made
because (addr != 0)
3) All the pages are freed up (from the reject list made
in #1)
4) The order 0 pages within the "addr" page is added to the
single_pages list.
At this point we have corrupted system state since there are now
pages which are both freed and in the page coloring lists.
- Multiple processes play well together if they don't fill the cache.
Actually, to truly get this, one needs to do the bucket walking
on a per-address space basis, each address space starting with
a random bucket.
The page coloring patch I sent you has code which does this.
- compatible with future superpage (e.g. MIPS, Alpha) work
I've already implemented superpage support which is working.
The only thing left for my implementation is page reservation
to made the superpage promotion more intelligent and efficient.
Give it a try on your favorite cache-filling application and let me
know what you think. Thanks.
I think we should really investigate two issues before we really
take your patch seriously:
1) Page fragmentation.
Keep in mind that if lots of interleaved page cache and user
anonymous page allocations fragment the buddy queues badly,
there is no point in implementating superpage support since
you won't be able to get superpages in such a case.
A good way to test this is to use the largest page coloring
bucket set, with decreasingly smaller amounts of system
memory. For example, can the allocator handle a 4MB L2
cache and 32MB of memory and not thrash itself to death?
I honestly believe that your coloring patch lacks this
quality, but I'll be fixing the bug I found above to
see for sure.
2) The effect of this coloring allocator on kernel TLB miss
performance.
Alpha and MIPS do not have this issue, so I believe it is
important that I bring this up because many other architectures
we support will have this issue.
Alpha and MIPS have a KSEG0 which statically maps the kernel
into a region of the virtual address space.
Most other architectures map the kernel using real TLB entries,
and as such kernel mappings compete with user mappings for the
TLB resources on the cpu. Increasing the TLB footprint of the
kernel will decrease the effective TLB reach obtained by user
applications. It will also make the kernel run slower.
I believe that a coloring allocator scheme like yours is conducive
to larger kernel TLB miss rates. The buddy allocator is more
conducive to keeping kernel accessed memory within a smaller number
of kernel TLB mappings (even more so if the architecure uses large
(ie. 4MB) TLB mappings for kernel memory).
3) The effect of user/kernel time used for real life applications
(sorry, I know you want improved SpecFP/INT numbers on Alpha
but something which screws performance for what people mostly
use a general purpose OS like Linus for is simply unacceptable)
For example, what does "time make -s vmlinux" give with and
without page coloring?
The only way I have found to fight both the fragmentation and kernel
TLB miss problems is to not try to disassosciate the coloring
allocator from the normal buddy system allocator. Yes, this is
a slightly more expensive implementation, but such an allocator can
be used to solve totally unrelated problems such as virtual cache
coloring and it just naturally supports superpages with no changes
whatsoever.
Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:17 EST