Re: [patch 00/19] mutex subsystem, -V11

From: Arjan van de Ven
Date: Tue Jan 03 2006 - 10:13:41 EST


On Tue, 2006-01-03 at 15:07 +0000, David Howells wrote:
> Ingo Molnar <mingo@xxxxxxx> wrote:
>
> > this is version -V11 of the generic mutex subsystem, against v2.6.15.
>
> When compiling for x86 with no mutex debugging, I see:
>
> (gdb) disas mutex_lock
> Dump of assembler code for function mutex_lock:
> 0xc02950d0 <mutex_lock+0>: lock decl (%eax)
> 0xc02950d3 <mutex_lock+3>: js 0xc02950ef <.text.lock.mutex>
> 0xc02950d5 <mutex_lock+5>: ret
> End of assembler dump.
> (gdb) disas 0xc02950ef
> Dump of assembler code for function .text.lock.mutex:
> 0xc02950ef <.text.lock.mutex+0>: call 0xc0294ffb <__mutex_lock_noinline>
> 0xc02950f4 <.text.lock.mutex+5>: jmp 0xc02950d5 <mutex_lock+5>
> 0xc02950f6 <.text.lock.mutex+7>: call 0xc029509f <__mutex_unlock_noinline>
> 0xc02950fb <.text.lock.mutex+12>: jmp 0xc02950db <mutex_unlock+5>
> End of assembler dump.
>
> Can you arrange .text.lock.mutex+0 here to just jump directly to
> __mutex_lock_noinline? Otherwise we have an unnecessarily extended return
> path.

jmp is free on x86. eg zero cycles. Any trickery is more likely to cost
because of doing unexpected things.

>
> You may not want to make the JS go directly there, but you could have that go
> to a JMP to __mutex_lock_noinline rather than having a CALL followed by a JMP
> back to a return instruction.

unbalanced call/ret pairs are REALLY expensive on x86. The current x86
processors all do branch prediction on the ret based on a special
internal call stack, breaking the symmetry is thus a branch prediction
miss, eg 40+ cycles


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/