Re: [PATCH v3 7/7] locking/rtmutex: Acquire the hb lock via trylock after wait-proxylock.

From: Jiri Slaby
Date: Mon Jan 15 2024 - 07:54:45 EST


On 15. 01. 24, 12:52, Jiri Slaby wrote:
On 15. 01. 24, 12:40, Jiri Slaby wrote:
On 15. 09. 23, 17:19, Peter Zijlstra wrote:
On Fri, Sep 15, 2023 at 02:58:35PM +0200, Thomas Gleixner wrote:

I spent quite some time to convince myself that this is correct. I was
not able to poke a hole into it. So that really should be safe to
do. Famous last words ...

IKR :-/

Something like so then...

---
Subject: futex/pi: Fix recursive rt_mutex waiter state

So this breaks some random test in APR:

 From https://build.opensuse.org/package/live_build_log/openSUSE:Factory:Staging:G/apr/standard/x86_64:
testprocmutex       :  Line 122: child did not terminate with success

The child in fact terminates on https://github.com/apache/apr/blob/trunk/test/testprocmutex.c#L93:
                 while ((rv = apr_proc_mutex_timedlock(proc_lock, 1))) {
                     if (!APR_STATUS_IS_TIMEUP(rv))
                         exit(1); <----- here

The test creates 6 children and does some pthread_mutex_timedlock/unlock() repeatedly (200 times) in parallel while sleeping 1 us inside the lock. The timeout is 1 us above. And the test expects all them to fail (to time out). But the time out does not always happen in 6.7 (it's racy, so the failure is semi-random: like 1 of 1000 attempts is bad).

This is not precise as I misinterpreted. The test is: either it succeeds or times out.

But since the commit, futex() yields 22/EINVAL, i.e. fails.

A simplified reproducer attached (in particular, no APR anymore). Build with -pthread, obviously. If you see
BADx rv=22

that's bad.

regards,
--
js
suse labs
#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/mman.h>
#include <sys/time.h>
#include <sys/wait.h>

#define MAX_WAIT_USEC (1000*1000)
#define CHILDREN 16
#define MAX_ITER 200

#define NS_PER_S 1000000000

static pthread_mutex_t *proc_lock;

static void child()
{
int rv, i = 0;

do {
int wait_usec = 0;
struct timespec abstime;

clock_gettime(CLOCK_REALTIME, &abstime);

abstime.tv_nsec += 1000;
if (abstime.tv_nsec >= NS_PER_S) {
abstime.tv_sec++;
abstime.tv_nsec -= NS_PER_S;
}

while ((rv = pthread_mutex_timedlock(proc_lock, &abstime))) {
if (rv != ETIMEDOUT) {
fprintf(stderr, "BADx rv=%d\n", rv);
abort();
}
if (++wait_usec >= MAX_WAIT_USEC)
abort();
}
//fprintf(stderr, "[%d] rv=%d\n", getpid(), rv);

i++;
usleep(1);
if (pthread_mutex_unlock(proc_lock))
abort();
} while (i < MAX_ITER);

exit(0);
}

int main(int argc, char **argv)
{
proc_lock = mmap(NULL, sizeof(*proc_lock),
PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED,
-1, 0);

pthread_mutexattr_t mattr;

pthread_mutexattr_init(&mattr);
pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
pthread_mutexattr_setrobust(&mattr, PTHREAD_MUTEX_ROBUST);
pthread_mutexattr_setprotocol(&mattr, PTHREAD_PRIO_INHERIT);

pthread_mutex_init(proc_lock, &mattr);

pthread_mutexattr_destroy(&mattr);

for (unsigned a = 0; a < CHILDREN; a++)
if (!fork())
child();

for (unsigned a = 0; a < CHILDREN; a++)
wait(NULL);

pthread_mutex_destroy(proc_lock);


return 0;
}