Re: [RFC] buddy optimizations.

From: Manfred Spraul (
Date: Thu May 25 2000 - 04:18:27 EST

Linus Torvalds wrote:
> In article <>,
> Manfred Spraul <> wrote:
> >
> >I'm still testing our memory allocators, and I added a per-cpu linked
> >list for order==0 to page_alloc:
> Manfred, if you _really_ want to speed up the buddy allocator on an SMP
> machine, there's a much simpler way: make sure that the
> "test_and_change_bit()" thing is not run with the "lock" prefix.

Around 45 cpu ticks faster:

before: 631
without "lock;" 586
without "lock;", and with a per-cpu list: 385

One gfp/free contains 2 superflous "lock;" cycles
But the per-cpu list hits saves 200 cpu cycles, and it avoids touching
the spinlock - no cache line trashing.

But I'm mainly collecting stats:

* during kernel compile, 99.6% of all allocations were single page
allocs, and the rest were 2 pages. An 8 entry (per cpu) list optimized ~
30% of all allocations.

* Web serving a static page generates 99% hits, but they probably come
from getname().

--> first we should decide if getname() can use kmalloc(), then I'll
retest the gfp changes.

* using kmalloc(PAGE_SIZE) for getname() might be dangerous: kmalloc()
internally calls gfp(order==1).

Perhaps we should modify the slab allocator:

* kmalloc(PAGE_SIZE) should internally use gfp(order==0)

* if a slab contains only one entry, then both bufctl and slab structure
are superflous: there cannot be an internal fragmentation, a simple
single linked list is sufficient, the pointers could be stored somewhere
in "struct page"


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:13 EST