Re: [MODSLAB 3/7] A Kmalloc subsystem

From: Christoph Lameter
Date: Fri Aug 18 2006 - 02:15:47 EST


On Thu, 17 Aug 2006, Manfred Spraul wrote:

> I'm not sure that the current approach with virt_to_page()/vmalloc_to_page()
> is the right thing(tm): Both functions are slow.

I am not so sure.

C code

typedef union ia64_va {
struct {
unsigned long off : 61; /* intra-region offset */
unsigned long reg : 3; /* region number */
} f;
unsigned long l;
void *p;
} ia64_va;

#define __pa(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg = 0; _v.l;})
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
# define pfn_to_page(pfn) (vmem_map + (pfn))


Here is a disassembly of kfree() (just relevant instructions, compiler
creates a mess and wastes scores of instructions repeating the same
thing about 3 times due to the failure to properly optimize (gcc 3.3) the
debugging crud in the existing slab):

0xa000000100144850 <kfree+48>: [MII] addl r11=-1995424,r1

r11 = &vmem_map

0xa000000100144870 <kfree+80>: [MII] mov r10=r11

r10 = &vmem_map

0xa000000100144881 <kfree+97>: ld8 r18=[r10]

r18 = [vmem_map]

0xa000000100144890 <kfree+112>: [MMI] shladd r16=r3,3,r18;;
0xa000000100144891 <kfree+113>: ld4.acq r2=[r16]

struct page * = r16 = vmem_map + pfn
r2 = flags;

0xa0000001001448a0 <kfree+128>: [MII] nop.m 0x0
0xa0000001001448a1 <kfree+129>: tbit.z p8,p9=r2,14;;

if(PageCompound(page))
.....

(And some more crud: A useless check for PageCompound (in the hot path!)
although page compound is not used for higher order slabs (only for non
MMU arches))

So we only have a single lookup of vmem_map from memory in order to
calculate the address of struct page. The cacheline for vmem_map is
heavily used and certainly in memory. virt_to_page seems to be a very
efficient means to get to struct page. The problem scope may simply be
to minimize the cachelines touched during free and alloc.

vmalloc_to_addr is certainly slower due to the page table walking. But the
user already is aware of the fact that vmalloc memory is not as fast as
direct mapped.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/