Re: [PATCH] Simple Page Coloring (2.3.99pre3 diffs)

From: David S. Miller (davem@redhat.com)
Date: Tue Apr 11 2000 - 19:58:54 EST


   Date: Tue, 11 Apr 2000 17:24:48 -0400
   From: "Joseph A. Martin" <jmartin@linux08.mro.dec.com>

    - Fragmentation of buddy blocks appears to degrade performance over
      the life of the system, but so far I am surprised at how little;
      performance remains significantly above no coloring.

Can you give some datapoints which describe why you believe
the fragmentation is "little"? How much ram is in your machine?

I've been working for the 4th time on a coloring page allocator,
so I know what tests stress this characteristic of any change
made to the page allocation in the Linux kernel. It's very simple,
take 2 kernels, 1 with coloring and 1 without. Boot each to single
user mode, immediately perform the following:

bash# cd /usr/wherever/src/linux
bash# cat `find . -type f -name "*.[ch]"` >/dev/null

Then hit "Right-SHIFT+ScrollLCK" and look at the buddy lists.
Here is what I see:

  14 3 0 0 1 0 1 1 0 30 : without color
   5 4 2 262 38 3 1 0 1 23 : compaq color
  39 10 3 2 1 0 0 1 0 30 : color

The lines read from left to right, from order 0 pages onward.
The first line is a stock 2.3.99-pre4 kernel, the next line
is with your coloring patch applied, and the final line is from
the current coloring patch I am working on.

The machine has a 256K L2 cache, an 8K page size, and 256MB of ram.

I believe the level of higher order page fragmentation is problematic.
But before I could come to this conclusion I wished to run the next
level of page fragmentation testing, repeatly performing a full kernel
build (ie. (make mrproper; make oldconfig; make clean; make vmlinux);
perhaps 3 or 4 times) to see what the fragmentation looks like with your
patch after that much activity.

Unfortunately, I could not even boot the machine fully with your
patch. I found the problem though.

The OOPS was at rmqueue() for the first BAD_RANGE(zone,page) test
inside the loop, this clearly indicates buddy list corruption. The
locking in coloring.c looked correct, and besides my test system is a
uniprocessor, but the page tracking of get_named_block seems to be
buggy. Firstly, if the alloc_pages call succeeds, "addr" can never be
zero but it can be that all of the pages obtained are not of the
desired color.

This situation would make the following occur:

1) All pages are added to the reject list, if one of
   the alloc_pages fails, we break out of the loop
   with page == NULL and addr != 0
   (note this also indicates that memory has been
    violently fragmented, but the OOPS locked up
    my box so I couldn't get a page allocator debugging
    dump to be sure)
2) The recursive get_named_block calls are not made
   because (addr != 0)
3) All the pages are freed up (from the reject list made
   in #1)
4) The order 0 pages within the "addr" page is added to the
   single_pages list.

At this point we have corrupted system state since there are now
pages which are both freed and in the page coloring lists.

    - Multiple processes play well together if they don't fill the cache.

Actually, to truly get this, one needs to do the bucket walking
on a per-address space basis, each address space starting with
a random bucket.

The page coloring patch I sent you has code which does this.

    - compatible with future superpage (e.g. MIPS, Alpha) work

I've already implemented superpage support which is working.
The only thing left for my implementation is page reservation
to made the superpage promotion more intelligent and efficient.

   Give it a try on your favorite cache-filling application and let me
   know what you think. Thanks.

I think we should really investigate two issues before we really
take your patch seriously:

1) Page fragmentation.

   Keep in mind that if lots of interleaved page cache and user
   anonymous page allocations fragment the buddy queues badly,
   there is no point in implementating superpage support since
   you won't be able to get superpages in such a case.

   A good way to test this is to use the largest page coloring
   bucket set, with decreasingly smaller amounts of system
   memory. For example, can the allocator handle a 4MB L2
   cache and 32MB of memory and not thrash itself to death?
   I honestly believe that your coloring patch lacks this
   quality, but I'll be fixing the bug I found above to
   see for sure.

2) The effect of this coloring allocator on kernel TLB miss
   performance.

   Alpha and MIPS do not have this issue, so I believe it is
   important that I bring this up because many other architectures
   we support will have this issue.

   Alpha and MIPS have a KSEG0 which statically maps the kernel
   into a region of the virtual address space.

   Most other architectures map the kernel using real TLB entries,
   and as such kernel mappings compete with user mappings for the
   TLB resources on the cpu. Increasing the TLB footprint of the
   kernel will decrease the effective TLB reach obtained by user
   applications. It will also make the kernel run slower.

   I believe that a coloring allocator scheme like yours is conducive
   to larger kernel TLB miss rates. The buddy allocator is more
   conducive to keeping kernel accessed memory within a smaller number
   of kernel TLB mappings (even more so if the architecure uses large
   (ie. 4MB) TLB mappings for kernel memory).
   
3) The effect of user/kernel time used for real life applications
   (sorry, I know you want improved SpecFP/INT numbers on Alpha
    but something which screws performance for what people mostly
    use a general purpose OS like Linus for is simply unacceptable)
   For example, what does "time make -s vmlinux" give with and
   without page coloring?

The only way I have found to fight both the fragmentation and kernel
TLB miss problems is to not try to disassosciate the coloring
allocator from the normal buddy system allocator. Yes, this is
a slightly more expensive implementation, but such an allocator can
be used to solve totally unrelated problems such as virtual cache
coloring and it just naturally supports superpages with no changes
whatsoever.

Later,
David S. Miller
davem@redhat.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:17 EST