Re: [patch 0/9] mutex subsystem, -V4

From: Nicolas Pitre
Date: Mon Dec 26 2005 - 10:29:50 EST


On Sun, 25 Dec 2005, Andrew Morton wrote:

> Ingo has told me offline that he thinks that we can indeed remove the
> current semaphore implementation.
>
> 90% of existing semaphore users get migrated to mutexes.
>
> 9% of current semaphore users get migrated to completions.
>
> The remaining 1% of semaphore users are using the counting feature. We
> reimplement that in a mostly-arch-independent fashion and then remove the
> current semaphore implementation. Note that there are no sequencing
> dependencies in all the above.

Great! I'm glad you're amenable to such changes. Especially the
"remove the current semaphore implementation" part many people are
complaining about for all sorts of reasons.

> Another example: Ingo's VFS stresstest which is hitting i_sem hard: it only
> does ~8000 ops/sec on an 8-way, and it's an artificial microbenchmark which
> is _designed_ to hit that lock hard. So if/when i_sem is converted to a
> mutex, I figure that the benefits to ARM in that workload will be about a
> 0.01% performance increase. ie: about two hours' worth of Moore's law in a
> dopey microbenchmark.

There is not only the question of cycles. There is also kernel text
size. And _that_ is significant. You could argue that only adding a
function call for every mutex lock/unlock cannot be made smaller.
However on ARM a function call is quite a costly operation with frame
pointers, (and frame pointers are on by default on ARM otherwise the
kernel is undebuggable). So given that current semaphore fast path is
inlined to save on the function call overhead, and given that on ARM the
transition from semaphores to mutexes will shrink that from 8
instruction down to 3 instructions, or even 2 instructions in some
cases, will not only save cycles, but a bunch of kernel .text bytes as
well without compromize on the cycle count. This is therefore an all
win situation for ARM.

> Also, there's very little point in adding lots of tricky arch-dependent
> code and generally mucking up the kernel source to squeeze the last drop of
> performance out of the sleeping lock fastpath. Because if those changes
> actually make a difference, we've already lost - the code needs to be
> changed to use spinlocking.

Consider my text size argument above which is something I believe you
value more.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/