Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

From: Steven Rostedt
Date: Fri Nov 16 2007 - 18:37:47 EST





On Sat, 17 Nov 2007, Remy Bohmer wrote:

> Hello Steven,
>
> Thanks for your reply
>
> > The above sounds more like you need a completion.
> Funny, I first started with using completion structures, but that did
> not work either. I get similar OOPses on all these kind of locking
> mechanisms, as long as I use the _interruptible() type. I tried every
> work-around I can think of, but none worked :-((
> Even if I block on an ordinary rt-mutex in the same routine, wait a
> _interruptible() type, I get the same problem.

The taker of a mutex must also be the one that releases it. I don't see
how you could use a mutex for this. It really requires some kind of
completion, or a compat_semaphore.

>
> > What's used to wake up the caller of down_interruptible?
> A call to up() is used from inside an interrupt(thread) context, but
> this is not relevant for the problem, because only blocking on a
> semaphore with down_interruptible() and waking the thread by CTRL-C is
> enough to get this Oops.
>
> I saw that the code is trying to wake 'other waiters', while there is
> only 1 thread waiting on the semaphore at most. I feel that the root
> cause of this problem has to be searched there.
>
> I believe that executing any PI code on semaphores is a strange
> behavior anyway, because a semaphore is never 'owned' by a thread, and
> it is always another thread that wakes the thread that blocks on a
> semaphore, and because the waker is unknown, the PI code will always
> boost the prio of the wrong thread.

Exactly why it should be a completion or compat semaphores. The reason we
did PI on semaphores is only because they were used as mutexes before Ingo
pushed to actually get a mutex primative into the kernel. Since then,
we've been trying to remove all semaphores with either a mutex or
completion.

>
> Strange is also, that I get different behavior on ARM if I use
> sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to
> crash less, it will not crash until the first up(); while the first
> will crash even without any up().
>
> Attached I have put a sample driver I just hacked together a few
> minutes ago. It is NOT the driver that has generates the oops in the
> previous mail, but I have stripped a scull-driver down that much that
> it will be much easier to talk about, and to keep us focussed on the
> part of the code that is causing this.
> Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the
> also OOPSes although slightly different than on ARM. See the attached
> dummy.txt file.
>
>
> Beware: The up(&sema) is NOT required to get this OOPS, I get it even
> without any up(&sema) !
>
> I hope you can look at the attached driver source and help me with this...
>

down_interruptible(&dummy);
printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
down_interruptible(&dummy);


This double down is actually illegal with rt semaphores. Because we treat
semaphores as mutexes unless they are declared as compat_semaphores. In
which case we don't do PI.

Seems that you need to work out how to use a completion for your code. And
if that doesn't work, then use a compat_semaphore. But beware, that the
compat_semaphore can cause unbounded latencies. But then again, so can
completions.


-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/