Re: [PATCH v4 00/15] Add futex2 syscalls

From: Sebastian Andrzej Siewior
Date: Tue Jun 08 2021 - 10:58:01 EST


On 2021-06-08 16:23:45 [+0200], Peter Zijlstra wrote:
> There's more futex users than glibc, and some of them are really hurting
> because of the NUMA issue. Oracle used to (I've no idea what they do or
> do not do these days) use sysvsem because the futex hash table was a
> massive bottleneck for them.
>
> And as Nick said, other vendors are having the same problems.

I just wanted to do a brief summary of last events. The implementation
tglx did with the cookie resulting in a quick lookup did not have any
downsides except that the user-API had to change glibc couldn't. So if
we are back to square one why not start with that.

> And if you don't extend the futex to store the nid you put the waiter in
> (see all the problems above) you will have to do wakeups on all nodes,
> which is both slower than it is today, and scales possibly even worse.
>
> The whole numa-aware qspinlock saga is in part because of futex.

sure.

> That said; if we're going to do the whole futex-vector thing, we really
> do need a new interface, because the futex multiplex monster is about to
> crumble (see the fun wrt timeouts for example).

This might have been a series of unfortunate events leading to this. The
sad part is that glibc has a comment that the kernel does not support
this and nobody bother to change it (until recently).

> And if we're going to do a new interface, we ought to make one that can
> solve all these problems. Now, ideally glibc will bring forth some
> opinions, but if they don't want to play, we'll go back to the good old
> days of non-standard locking libraries.. we're halfway there already due
> to glibc not wanting to break with POSIX were we know POSIX was just
> dead wrong broken.
>
> See: https://github.com/dvhart/librtpi

I'm aware of that, I hacked on it, too :) This was the unfortunate
result of a ~8y old bug which was not fixed instead and part of the code
was rewritten and a bit-spinlock was added in user-land. You may
remember the discussion regarding spins in userland…
That said, REQUEUE_PI is no longer used by glibc.

Sebastian