Re: [BUG] threaded processes get stuck in rt_sigsuspend/fillonedir/exit_notify

From: David Ford (david@kalifornia.com)
Date: Thu Sep 14 2000 - 01:11:34 EST


Mark Kettenis wrote:

> From: Ulrich Drepper <drepper@redhat.com>
> Date: 2000-09-13 6:35:16
>
> tytso@mit.edu writes:
>
> > I didn't realize things had changed that broke the old threading model.
> > Did Linus do more than add support for the new thread groups? I didn't
> > think any that had changed that would break the old LinuxThread
> > programs.
>
> First he introduces CLONE_THREAD (or how it was called). This was
> fine. But in pre2 ore pre3 he unified CLONE_SIGHAND and CLONE_THREAD
> under the new name CLONE_SIGNAL which makes perfect if CLONE_SIGHAND
> would be used. But it is. Simply undo this change, separate the two
> flags.
>
> This has been fixed in test8 (final). CLONE_SIGHAND and CLONE_THREAD
> are seperate bits again, and CLONE_SIGNAL is both of them or'ed
> together.

Something else is broken then which only happens to threaded programs on test8
stuff.

$ pse|grep defun
   88 [nscd <defunct>] exit_notify
   89 [nscd <defunct>] exit_notify
   90 [nscd <defunct>] exit_notify
   91 [nscd <defunct>] exit_notify
 1097 [host <defunct>] exit_notify
 1098 [host <defunct>] exit_notify
 1110 [host <defunct>] exit_notify
 1111 [host <defunct>] exit_notify
 1115 [nslookup <defun exit_notify
 1116 [nslookup <defun exit_notify
 1117 [nslookup <defun exit_notify
 1118 [nslookup <defun exit_notify
 1124 [nslookup <defun exit_notify
 1125 [nslookup <defun exit_notify
 1126 [nslookup <defun exit_notify
 1170 [host <defunct>] exit_notify
 1175 [host <defunct>] exit_notify
 1177 [host <defunct>] exit_notify
 1182 [host <defunct>] exit_notify
 1184 [host <defunct>] exit_notify
 1190 [host <defunct>] exit_notify
 1192 [host <defunct>] exit_notify
 1197 [host <defunct>] exit_notify
 1199 [host <defunct>] exit_notify
 1168 [host <defunct>] exit_notify
14899 [xmms <defunct>] exit_notify
14902 [xmms <defunct>] exit_notify
14903 [xmms <defunct>] exit_notify
[...] goes on and on. almost 100 processes stuck now.

here's one that is currently stuck:

 8243 pan rt_sigsuspend
 8244 pan rt_sigsuspend
 8249 pan rt_sigsuspend

gdb pan 8243
[...]
(gdb) bt
#0 0x40498d05 in __sigsuspend (set=0xbf7ffab4)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
#1 0x4044f2f4 in __pthread_wait_for_restart_signal (self=0xbf7ffe40)
    at pthread.c:785
#2 0x4044cfd2 in pthread_cond_timedwait_relative_new (cond=0x80eb4f8,
    mutex=0x80eebd0, abstime=0xbf7ffd64) at restart.h:26
#3 0x4044d054 in pthread_cond_timedwait (cond=0x80eb4f8, mutex=0x80eebd0,
    abstime=0xbf7ffd64) at condvar.c:381
#4 0x808ade6 in queue_mainloop ()
#5 0x4044d938 in pthread_start_thread (arg=0xbf7ffe40) at manager.c:241

a kill -9 is the only thing that will kick it, but that yields a zombie.

-d

--
"The difference between 'involvement' and 'commitment' is like an
eggs-and-ham breakfast: the chicken was 'involved' - the pig was
'committed'."


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:22 EST