Re: [RFC PATCH] greatly reduce SLOB external fragmentation

From: Linus Torvalds
Date: Thu Jan 10 2008 - 14:42:40 EST




On Thu, 10 Jan 2008, Matt Mackall wrote:
>
> One idea I've been kicking around is pushing the boundary for the buddy
> allocator back a bit (to 64k, say) and using SL*B under that. The page
> allocators would call into buddy for larger than 64k (rare!) and SL*B
> otherwise. This would let us greatly improve our handling of things like
> task structs and skbs and possibly also things like 8k stacks and jumbo
> frames.

Yes, something like that may well be reasonable. It could possibly solve
some of the issues for bigger page cache sizes too, but one issue is that
many things actually end up having those power-of-two alignment
constraints too - so an 8kB allocation would often still have to be
naturally aligned, which then removes some of the freedom.

> Crazy?

It sounds like it might be worth trying out - there's just no way to know
how well it would work. Buddy allocators sure as hell have problems too,
no question about that. It's not like the page allocator is perfect.

It's not even clear that a buddy allocator even for the high-order pages
is at all the right choice. Almost nobody actually wants >64kB blocks, and
the ones that *do* want bigger allocations tend to want *much* bigger
ones, so it's quite possible that it could be worth it to have something
like a three-level allocator:

- huge pages (superpages for those crazy db people)

Just a simple linked list of these things is fine, we'd never care
about coalescing large pages together anyway.

- "large pages" (on the order of ~64kB) - with *perhaps* a buddy bitmap
setup to try to coalesce back into huge-pages, but more likely just
admitting that you'd need something like migration to ever get back a
hugepage that got split into large-pages.

So maybe a simple bitmap allocator per huge-page for large pages. Say
you have a 4MB huge-page, and just a 64-bit free-bitmap per huge-page
when you split it into large pages.

- slab/slub/slob for anything else, and "get_free_page()" ends up being
just a shorthand for saying "naturally aligned kmalloc of size
"PAGE_SIZE<<order"

and maybe it would all work out ok.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/