Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

From: Vegard Nossum
Date: Sun Jan 18 2009 - 08:44:35 EST


On Sun, Jan 18, 2009 at 1:12 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> On Fri, Jan 16, 2009 at 9:48 PM, Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:
>>
>> I tried your program on latest git tree and could not reproduce any problem.
>>
>> (changed to 9 threads since I have 8 cpus)
>
> Hm. My machine has 2 CPUs. I just reproduced it on a more recent
> kernel, this time from:
>
> commit a6525042bfdfcab128bd91fad264de10fd24a55e
> Date: Tue Jan 13 14:53:16 2009 -0800
>
> with lockdep enabled, and no bad messages. So it seems that it is not
> a deadlock at least...
>
>> Problem might be that your threads all fight on the same pipe, with
>> a mutex protecting its inode.
>>
>>
>> So mutex_lock() could possibly starve for more than 120 second ?
>
> Much longer. Last time this happened, the zombies stayed for many
> hours (until I rebooted the machine).
>
>> Maybe you can reproduce the problem using standard read()/write() syscalls...
>
> I can try...
>
> Tasks: 7 total, 0 running, 6 sleeping, 0 stopped, 1 zombie
> Cpu(s): 0.4%us, 6.0%sy, 0.0%ni, 92.6%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 1016180k total, 54596k used, 961584k free, 4084k buffers
> Swap: 104380k total, 0k used, 104380k free, 20412k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3808 500 20 0 0 0 0 Z 0 0.0 0:00.00 a.out <defunct>
> 3809 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3810 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3813 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3815 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3817 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3821 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
>
> root@ubuntu:/home/debian# strace -p 3808
> attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
>
> root@ubuntu:/home/debian# strace -p 3809
> Process 3809 attached - interrupt to quit
> <nothing>
> ^C^C^C^C^C^C^C^C
> <doesn't quit>
>
> root@ubuntu:/home/debian# cat /proc/3808/syscall
> 0 0xbfa1f5b4 0xbfa1f5b4 0xc8bff4 0xbfa1f5b4 0x0 0xbfa1f5c8 0xbfa1f3f8 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3809/syscall
> 313 0x9 0x0 0x7 0x0 0x3 0x0 0xb80012cc 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3810/syscall
> 313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb780037c 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3813/syscall
> 313 0xa 0x0 0x7 0x0 0x3 0x0 0xb67fe2cc 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3815/syscall
> 313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb5ffd37c 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3817/syscall
> 313 0x8 0x0 0x7 0x0 0x3 0x0 0xb4ffb2cc 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3821/syscall
> 313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb47fa37c 0xb8020424
>
> Also managed to grab this this time:
>
> SysRq : Show Locks Held
>
> Showing all locks held in the system:
> 1 lock held by getty/2130:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2131:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2134:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2138:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2142:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2143:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by a.out/3809:
> #0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
> splice_to_pipe+0x25/0x260
> 2 locks held by a.out/3810:
> #0: (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
> splice_from_pipe+0x5c/0x90
> #1: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10be5ac>] pipe_wait+0x6c/0x80
> 1 lock held by a.out/3813:
> #0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
> splice_to_pipe+0x25/0x260
> 2 locks held by a.out/3815:
> #0: (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
> inode_double_lock+0x32/0xb0
> #1: (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
> splice_from_pipe+0x5c/0x90
> 1 lock held by a.out/3817:
> #0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
> splice_to_pipe+0x25/0x260
> 1 lock held by a.out/3821:
> #0: (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
> inode_double_lock+0x32/0xb0
> 2 locks held by bash/3916:
> #0: (sysrq_key_table_lock){....}, at: [<c12297f7>] __handle_sysrq+0x17/0x140
> #1: (tasklist_lock){..--}, at: [<c1059354>] debug_show_all_locks+0x34/0x180
>
> =============================================
>

I have one theory. We have this skeleton:

ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
loff_t *ppos, size_t len, unsigned int flags,
splice_actor *actor)
{
...
inode_double_lock(inode, pipe->inode);
ret = __splice_from_pipe(pipe, &sd, actor);
inode_double_unlock(inode, pipe->inode);
...
}

ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
splice_actor *actor)
{
...
pipe_wait(pipe);
...
}

void pipe_wait(struct pipe_inode_info *pipe)
{
if (pipe->inode)
mutex_unlock(&pipe->inode->i_mutex);
...
if (pipe->inode)
mutex_lock(&pipe->inode->i_mutex);
}

So in short: Is it possible that inode_double_lock() in
splice_from_pipe() first locks the pipe mutex, THEN locks the
file/socket mutex? In that case, there should be a lock imbalance,
because pipe_wait() would unlock the pipe while the file/socket mutex
is held.

That would possibly explain the sporadicity of the lockup; it depends
on the actual order of the double lock.

Why doesn't lockdep report that? Hm. I guess it is because these are
both inode mutexes and lockdep can't detect a locking imbalance within
the same lock class?

Anyway, that's just a theory. :-) Will try to confirm by simplifying
the test-case.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/