Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity

From: Peter Zijlstra
Date: Wed May 14 2014 - 05:22:46 EST


On Wed, May 14, 2014 at 02:58:05AM -0400, Carlos O'Donell wrote:
> >> The handling of -EDEADLOCK is even more impressive. Instead of
> >> propagating it to the caller something in the guts of glibc calls pause().
> >>
> >> futex(0x601300, FUTEX_LOCK_PI_PRIVATE, 1) = -1 EDEADLK (Resource deadlock avoided)
> >> pause(
> >>
> >
> > Gotta love comments like these though - such trust!:
> >
> > /* The mutex is locked. The kernel will now take care of
> > everything. */
> >
> > IIRC, glibc takes the approach that if this operation fails, there is no way for
> > it to recovery "properly", and so it chooses to:
> >
> > /* Delay the thread indefinitely. */
> >
> > I believe the thinking goes that if we get to here, then the lock is in an
> > inconsistent state (between kernel and userspace). I don't have an answer for
> > why pausing forever would be preferable to returning an error however...
>
> What error would we return?

EDEADLK is a valid user return for pthread_mutex_lock() as per:

http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html

> This particular case is a serious error for which we have no good error code
> to return to userspace. It's an implementation defect, a bug, we should probably
> assert instead of pausing.

No, its perfectly fine to have a lock sequence abort with -EDEADLK.
Userspace should release its locks and re-attempt.

You can implement usable locking schemes using this error, like
wound/wait locking.

> We can't cancel the stuck thread because pthread_mutex_lock is not a cancellation
> point.
>
> In practice the rest of the application can make forward progress with a single
> thread stuck. You can attach the debugger and inspect state, so it's useful
> from that perspective.

That's just totally braindead. Return EDEADLK to userspace already, let
the user deal with it.

Attachment: pgp2I0n5AD74M.pgp
Description: PGP signature