Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity

From: Peter Zijlstra
Date: Wed May 14 2014 - 06:07:36 EST


On Wed, May 14, 2014 at 11:53:44AM +0200, Thomas Gleixner wrote:
> > What error would we return?
> >
> > This particular case is a serious error for which we have no good error code
> > to return to userspace. It's an implementation defect, a bug, we should probably
> > assert instead of pausing.
>
> Errm.
>
> http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html
>
> The pthread_mutex_lock() function may fail if:
>
> [EDEADLK]
> The current thread already owns the mutex.
>
> That's a exactly the error code, which the kernel returns when it
> detects a deadlock.
>
> And glibc returns EDEADLK at a lot of places already. So in that case
> it's not a serious error? Because it's detected by glibc. You can't be
> serious about that.
>
> So why is a kernel detected deadlock different? Because it detects not
> only AA, it detects ABBA and more. But it's still a dead lock. And
> while posix spec only talks about AA, it's the very same issue.
>
> So why not propagate this to the caller so he gets an alert right away
> instead of letting him attach a debugger, and scratch his head and
> lookup glibc source to find out why the hell glibc called pause.

http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html

The pthread_mutex_lock() function may fail if:

[EDEADLK]
A deadlock condition was detected or the current thread already owns the mutex.

Which is explicitly wider than the AA recursion and fully supports the
full lock graph traversal we do.

Attachment: pgpE9azgFAh1i.pgp
Description: PGP signature