Re: 2.6.25.3: su gets stuck for root

From: Joe Peterson
Date: Sat Jun 14 2008 - 13:44:20 EST


Vegard Nossum wrote:
> So this clearly shows what's wrong; 7036 is the "controlling process"
> group id. But only "su foo" is in this group, the bash and stty
> processes have their own group, 7037.
>
> On my own system, when I do "su", I get this:
> 2891 2891 2892 root su temp
> 2892 2892 2892 temp bash
>
> ...and here the "bash" process is in the right group, 2892, while "su"
> is the one in the background!

Hmm.

> Can you try to run strace on the su to see where things go wrong, i.e.
>
> $ strace -f -e trace=process su foo
>
> ...and we're only interested in what happens up to the point where it
> hangs. That should hopefully tell us which process is doing the wrong
> thing. In either case, as Alan pointed out, this seems unlikely to be
> a kernel problem.

OK, I attached this as a text file at the end. But (*bummer*), using
strace makes it impossible to reproduce the hang (figures, and I believe
someone earlier in the thread also had this problem).

As for whether the kernel is at fault, not sure (i.e. does this hang
behavior implicate the kernel automatically or can a user-space process
cause itself such an issue?). But I *do* see different behavior
depending on the kernel version. There were a couple of git kernels in
which I could not reproduce it. Still, if it is a race or something, it
might be that the conditions were just slightly perturbed.

I attached the strace log just in case it is of help.

-Joe
7009 execve("/bin/su", ["su", "foo"], [/* 32 vars */]) = 0
7009 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7e3d708) = 7010
7010 execve("/bin/bash", ["bash"], [/* 31 vars */]) = 0
7010 clone( <unfinished ...>
7009 waitpid(-1, <unfinished ...>
7010 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7011
7011 exit_group(0) = ?
7010 --- SIGCHLD (Child exited) @ 0 (0) ---
7010 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 7011
7010 waitpid(-1, 0xbff58cec, WNOHANG) = -1 ECHILD (No child processes)
7010 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7012
7012 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7013
7013 execve("/usr/bin/dircolors", ["dircolors", "-b", "/etc/DIR_COLORS"], [/* 31 vars */]) = 0
7013 exit_group(0) = ?
7012 --- SIGCHLD (Child exited) @ 0 (0) ---
7012 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 7013
7012 waitpid(-1, 0xbff585ec, WNOHANG) = -1 ECHILD (No child processes)
7012 exit_group(0) = ?
7010 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7012
7010 --- SIGCHLD (Child exited) @ 0 (0) ---
7010 waitpid(-1, 0xbff5873c, WNOHANG) = -1 ECHILD (No child processes)
7010 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7014
7014 execve("/bin/sleep", ["sleep", "2"], [/* 31 vars */]) = 0
7010 waitpid(-1, <unfinished ...>
7014 exit_group(0) = ?
7010 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7014
7010 --- SIGCHLD (Child exited) @ 0 (0) ---
7010 waitpid(-1, 0xbff593dc, WNOHANG) = -1 ECHILD (No child processes)
7010 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7015
7015 execve("/bin/stty", ["stty", "-ixany"], [/* 31 vars */]) = 0
7015 exit_group(0) = ?
7010 --- SIGCHLD (Child exited) @ 0 (0) ---
7010 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 7015
7010 waitpid(-1, 0xbff5936c, WNOHANG) = -1 ECHILD (No child processes)
7010 exit_group(0) = ?
7009 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED) = 7010
7009 exit_group(0) = ?