[BUG linux-2.6.15-rc] process events connector - soft lockupdetected

From: Guillaume Thouvenin
Date: Fri Nov 25 2005 - 08:42:20 EST


Hello,

I compiled a kernel 2.6.15-rc1 and a 2.6.15-rc2 with Process Events
Connector enabled. The machine has two processors. When I use the
process events connector, a soft lockup is detected (for both releases):

---------B<---------------------------------------
Pid: 2770, comm: sh
EIP: 0060:[<c039ae28>] CPU: 1
EIP is at __read_lock_failed+0x8/0x14
EFLAGS: 00000297 Not tainted (2.6.15-rc2)
EAX: c046fc00 EBX: f5946000 ECX: f5947f7c EDX: 00000286
ESI: 00000000 EDI: ffffffff EBP: f5946000 DS: 007b ES: 007b
CR0: 8005003b CR2: 080e173c CR3: 358a0000 CR4: 000006d0
[<c039c566>] _read_lock+0xb/0xc
[<c011e7ee>] do_wait+0x9d/0x40c
[<c0116fd0>] default_wake_function+0x0/0x12
[<c0116fd0>] default_wake_function+0x0/0x12
[<c011ec32>] sys_wait4+0x43/0x45
[<c0102e39>] syscall_call+0x7/0xb
BUG: soft lockup detected on CPU#1!

Pid: 2770, comm: sh
EIP: 0060:[<c039ae28>] CPU: 1
EIP is at __read_lock_failed+0x8/0x14
EFLAGS: 00000297 Not tainted (2.6.15-rc2)
EAX: c046fc00 EBX: f5946000 ECX: f5947f7c EDX: 00000286
ESI: 00000000 EDI: ffffffff EBP: f5946000 DS: 007b ES: 007b
CR0: 8005003b CR2: 080e173c CR3: 358a0000 CR4: 000006d0
[<c039c566>] _read_lock+0xb/0xc
[<c011e7ee>] do_wait+0x9d/0x40c
[<c0116fd0>] default_wake_function+0x0/0x12
[<c0116fd0>] default_wake_function+0x0/0x12
[<c011ec32>] sys_wait4+0x43/0x45
[<c0102e39>] syscall_call+0x7/0xb
BUG: soft lockup detected on CPU#1!

Pid: 2770, comm: sh
EIP: 0060:[<c039ae28>] CPU: 1
EIP is at __read_lock_failed+0x8/0x14
EFLAGS: 00000297 Not tainted (2.6.15-rc2)
EAX: c046fc00 EBX: f5946000 ECX: f5947f7c EDX: 00000286
ESI: 00000000 EDI: ffffffff EBP: f5946000 DS: 007b ES: 007b
CR0: 8005003b CR2: 080e173c CR3: 358a0000 CR4: 000006d0
[<c039c566>] _read_lock+0xb/0xc
[<c011e7ee>] do_wait+0x9d/0x40c
[<c0116fd0>] default_wake_function+0x0/0x12
[<c0116fd0>] default_wake_function+0x0/0x12
[<c011ec32>] sys_wait4+0x43/0x45
[<c0102e39>] syscall_call+0x7/0xb
BUG: soft lockup detected on CPU#1!

Pid: 2770, comm: sh
EIP: 0060:[<c039ae25>] CPU: 1
EIP is at __read_lock_failed+0x5/0x14
EFLAGS: 00000297 Not tainted (2.6.15-rc2)
EAX: c046fc00 EBX: f5946000 ECX: f5947f7c EDX: 00000286
ESI: 00000000 EDI: ffffffff EBP: f5946000 DS: 007b ES: 007b
CR0: 8005003b CR2: 080e173c CR3: 358a0000 CR4: 000006d0
[<c039c566>] _read_lock+0xb/0xc
[<c011e7ee>] do_wait+0x9d/0x40c
[<c0116fd0>] default_wake_function+0x0/0x12
[<c0116fd0>] default_wake_function+0x0/0x12
[<c011ec32>] sys_wait4+0x43/0x45
[<c0102e39>] syscall_call+0x7/0xb

.... etc, etc where EIP is sometimes equal to 0060:[<c039ae25>] CPU: 1
and sometimes equal to 0060:[<c039ae28>] CPU: 1
---------------------------------------------------

I think that the problem is in kernel/fork.c. The function
proc_fork_connector(p) is called inside a
write_lock_irq(&tasklist_lock). The
cn_netlink_send(msg,CN_IDX_PROC,GFP_KERNEL) is called by the
proc_fork_connector(). Thus, the alloc_skb(size,GFP_KERNEL) is called
within a write_lock_irq(&tasklist_lock)... is it the problem?

If I replace GFP_KERNEL by GFP_ATOMIC the soft lockup disappear (but I
don't know if this solution is right...)


Best regards,

Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/