Avi Kivity wrote:
An interesting (but perhaps difficult to achieve) optimization would be to spin in userspace.
I couldn't think of a lightweight way to determine when the owner has been scheduled out in userspace. Kernel assistance is required. You could do this on the schedule() side of things, but I figured I'd get some strong pushback if I tried to add a hook into descheduling that flipped a bit in the futex value stating the owner was about to deschedule(). Still, that might be something to explore.
In the futex value it's hopeless (since a thread can hold many locks),
It can, but there is a futex value per lock. If the task_struct had a list of held futex locks (as it does for pi futex locks) the deschedule() path could walk that and mark the FUTEX_OWNER_SLEEPING bit.
but I don't think it's unreasonable to set a bit in the thread local storage area. The futex format would then need to be extended to contain a pointer to this bit.
This appears to be 1 bit per task instead of 1 bit per lock.
Also, the value is thread-specific... so how would a potential waiter be able to determine if the owner of a particular lock was running or not with this method? ... maybe I'm missing some core bit about TLS... are you talking about pthread_key_create() and pthread_getspecific() ?