Re: Several tst-robust* tests time out with recent Linux kernel

From: Edgecombe, Rick P
Date: Thu Nov 16 2023 - 20:23:17 EST


A bit more info...

The error returned to userspace is originating from:
https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295

'uval' is often zero in that error case, but sometimes just a
mismatching value like: uval=0x567, task_pid_vnr()=0x564


Depending on the number of CPUs the VM is running on it reproduces or
not. When it does reproduce, the newly added path here is taken:
https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185
The path is taken a lot during the test, sometimes >400 times before
the above linked error is generated during the syscall. When it doesn't
reproduce, I never saw that new path taken.

More print statements make the reproduction less reliable, so it does
seem to have a race in the mix at least somewhat. Otherwise, I haven't
tried to understand what is going on here with all this highwire
locking.

Hope it helps.