Re: aim7 scalability issue on 4 socket machine

From: Ingo Molnar
Date: Thu Sep 17 2009 - 06:35:35 EST



* Zhang, Yanmin <yanmin_zhang@xxxxxxxxxxxxxxx> wrote:

> ???Aim7 result is bad on my new Nehalem machines (4*8*2 logical cpu).
> Perf counter shows spinlock consumes 70% cpu time on the machine.
> Lock_stat shows anon_vma->lock causes most of the spinlock contention.
> Function tracer shows below call chain creates the spinlock.
>
> do_brk => vma_merge =>vma_adjust
>
> Aim7 consists of lots of subtests. One test is to fork lots of
> processes and every process calls sbrk for 1000 times to grow/shrink
> the heap. All the vma of the heap of all sub-processes point to the
> same anon_vma and use the same anon_vma->lock. When sbrk is called,
> kernel calls do_brk => vma_merge =>vma_adjust and lock anon_vma->lock
> to create spinlock contentions.
>
> There is a comment section in front of spin_lock(&anon_vma->lock. It
> says anon_vma lock can be optimized when just changing vma->vm_end. As
> a matter of fact, anon_vma->lock is used to protect anon_vma->list
> when an entry is deleted/inserted or the list is accessed. There is no
> such deletion/insertion if only vma->end is changed in function
> vma_adjust.
>
> Below patch fixes it.
>
> Test results with kernel 2.6.31-rc8. The improvement on the machine is
> about 150%.

Impressive speedup!

[ Also, the array of tools you used to debug this is impressive as well
;-) ]

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/