Re: [PATCH v2 0/4] futex: Wakeup optimizations

From: Davidlohr Bueso
Date: Wed Dec 18 2013 - 10:32:35 EST


ping?

If no one has any objections, could this patchset be picked up?

Thanks,
Davidlohr

On Tue, 2013-12-03 at 01:45 -0800, Davidlohr Bueso wrote:
> Changes from v1 [https://lkml.org/lkml/2013/11/22/525]:
> - Removed patch "futex: Check for pi futex_q only once".
>
> - Cleaned up ifdefs for larger hash table.
>
> - Added a doc patch from tglx that describes the futex
> ordering guarantees.
>
> - Improved the lockless plist check for the wake calls.
> Based on the community feedback, the necessary abstractions
> and barriers are added to maintain ordering guarantees.
> Code documentation is also updated.
>
> - Removed patch "sched,futex: Provide delayed wakeup list".
> Based on feedback from PeterZ, I will look into this as
> a separate issue once the other patches are settled.
>
>
> We have been dealing with a customer database workload on large
> 12Tb, 240 core 16 socket NUMA system that exhibits high amounts
> of contention on some of the locks that serialize internal futex
> data structures. This workload specially suffers in the wakeup
> paths, where waiting on the corresponding hb->lock can account for
> up to ~60% of the time. The result of such calls can mostly be
> classified as (i) nothing to wake up and (ii) wakeup large amount
> of tasks.
>
> Before these patches are applied, we can see this pathological behavior:
>
> 37.12% 826174 xxx [kernel.kallsyms] [k] _raw_spin_lock
> --- _raw_spin_lock
> |
> |--97.14%-- futex_wake
> | do_futex
> | sys_futex
> | system_call_fastpath
> | |
> | |--99.70%-- 0x7f383fbdea1f
> | | yyy
>
> 43.71% 762296 xxx [kernel.kallsyms] [k] _raw_spin_lock
> --- _raw_spin_lock
> |
> |--53.74%-- futex_wake
> | do_futex
> | sys_futex
> | system_call_fastpath
> | |
> | |--99.40%-- 0x7fe7d44a4c05
> | | zzz
> |--45.90%-- futex_wait_setup
> | futex_wait
> | do_futex
> | sys_futex
> | system_call_fastpath
> | 0x7fe7ba315789
> | syscall
>
>
> With these patches, contention is practically non existent:
>
> 0.10% 49 xxx [kernel.kallsyms] [k] _raw_spin_lock
> --- _raw_spin_lock
> |
> |--76.06%-- futex_wait_setup
> | futex_wait
> | do_futex
> | sys_futex
> | system_call_fastpath
> | |
> | |--99.90%-- 0x7f3165e63789
> | | syscall|
> ...
> |--6.27%-- futex_wake
> | do_futex
> | sys_futex
> | system_call_fastpath
> | |
> | |--54.56%-- 0x7f317fff2c05
> ...
>
> Patch 1 is a cleanup.
>
> Patch 2 addresses the well known issue of the global hash table.
> By creating a larger and NUMA aware table, we can reduce the false
> sharing and collisions, thus reducing the chance of different futexes
> using hb->lock.
>
> Patch 3 documents the futex ordering guarantees.
>
> Patch 4 reduces contention on the corresponding hb->lock by not trying to
> acquire it if there are no blocked tasks in the waitqueue.
> This particularly deals with point (i) above, where we see that it is not
> uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
> tasks were woken.
>
> This patchset has also been tested on smaller systems for a variety of
> benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of
> hb locks programs. So far, no functional or performance regressions have been seen.
> Furthermore, no issues were found when running the different tests in the futextest
> suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/
>
> This patchset applies on top of Linus' tree as of v3.13-rc2 (2e7babfa).
>
> Special thanks to Scott Norton, Tom Vanden, Mark Ray and Aswin Chandramouleeswaran
> for help presenting, debugging and analyzing the data.
>
> futex: Misc cleanups
> futex: Larger hash table
> futex: Document ordering guarantees
> futex: Avoid taking hb lock if nothing to wakeup
>
> kernel/futex.c | 230 ++++++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 194 insertions(+), 36 deletions(-)
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/