Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity

From: Thomas Gleixner
Date: Wed May 14 2014 - 05:53:54 EST

Next message: Ulf Hansson: "Re: [PATCH v3] mmc: sdhci-pci-o2micro: Add SeaBird SeaEagle SD3 support"
Previous message: Ulf Hansson: "Re: [PATCH RFC] mmc: add slot argument to mmc_of_parse"
In reply to: Peter Zijlstra: "Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity"
Next in thread: Peter Zijlstra: "Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 14 May 2014, Carlos O'Donell wrote:
> I agree. Something is wrong. There are *few* cases where we might probe the
> kernel to test for things, but you'd only see that failed futex syscall once.
> If this is repeating for each call to pthread_mutex_lock, then I would appreciate
> a test case I could use to debug this and then stick into the regression tests
> for glibc.

That happens if user space corrupts the lock->data. Here you go:

#define _GNU_SOURCE
#include <pthread.h>
#include <unistd.h>
#include <stdint.h>
#include <stdio.h>

#include <sys/syscall.h>

#define gettid() syscall(__NR_gettid)

static pthread_mutex_t m1;

static void *fn1(void *d)
{
pthread_mutex_lock(&m1);
sleep(60);
return NULL;
}

static void *fn2(void *d)
{
uint32_t *p = (uint32_t *) &m1;

*p = (uint32_t) 0x80000000 | gettid();
sleep(60);
return NULL;
}

int main(void)
{
pthread_mutexattr_t ma;
pthread_t t1, t2;
uint32_t *p;
int ret;

pthread_mutexattr_init(&ma);
pthread_mutexattr_setprotocol(&ma, PTHREAD_PRIO_INHERIT);
pthread_mutex_init(&m1, &ma);

p = (uint32_t *) &m1;
*p = (uint32_t) gettid();

pthread_create(&t1, NULL, fn1, NULL);
sleep(1);
pthread_create(&t2, NULL, fn2, NULL);
sleep(1);

ret = pthread_mutex_lock(&m1);
printf("ret %d %08x\n", ret, *p);
return 0;
}

# strace ./fail
...
futex(0x600e20, FUTEX_LOCK_PI_PRIVATE, 1) = -1 EINVAL (Invalid argument)
...

# ./fail
ret 0 80002991

And note, you cannot detect the inconsistency of this in user
space. Simply because in case of PI futexes the kernel manipulates the
user space value and you have no idea what's going on in paralell.

So you need to handle the return value from the kernel proper. Whether
you assert or return -EINVAL to the caller is a different story, but
returning 0 is horribly wrong.

I prefer EINVAL as this matches the spec in the widest sense:

The pthread_mutex_lock(), pthread_mutex_trylock() and
pthread_mutex_unlock() functions may fail if:

[EINVAL]
The value specified by mutex does not refer to an initialised mutex object.

And a wreckaged mutex is not a proper initialized one.

> > IIRC, glibc takes the approach that if this operation fails, there is no way for
> > it to recovery "properly", and so it chooses to:
> >
> > /* Delay the thread indefinitely. */
> >
> > I believe the thinking goes that if we get to here, then the lock is in an
> > inconsistent state (between kernel and userspace). I don't have an answer for
> > why pausing forever would be preferable to returning an error however...
>
> What error would we return?
>
> This particular case is a serious error for which we have no good error code
> to return to userspace. It's an implementation defect, a bug, we should probably
> assert instead of pausing.

Errm.

http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html

The pthread_mutex_lock() function may fail if:

[EDEADLK]
The current thread already owns the mutex.

That's a exactly the error code, which the kernel returns when it
detects a deadlock.

And glibc returns EDEADLK at a lot of places already. So in that case
it's not a serious error? Because it's detected by glibc. You can't be
serious about that.

So why is a kernel detected deadlock different? Because it detects not
only AA, it detects ABBA and more. But it's still a dead lock. And
while posix spec only talks about AA, it's the very same issue.

So why not propagate this to the caller so he gets an alert right away
instead of letting him attach a debugger, and scratch his head and
lookup glibc source to find out why the hell glibc called pause.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ulf Hansson: "Re: [PATCH v3] mmc: sdhci-pci-o2micro: Add SeaBird SeaEagle SD3 support"
Previous message: Ulf Hansson: "Re: [PATCH RFC] mmc: add slot argument to mmc_of_parse"
In reply to: Peter Zijlstra: "Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity"
Next in thread: Peter Zijlstra: "Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]