NMI deadlock detection works

Petr Vandrovec (vandrove@vc.cvut.cz)
Tue, 12 Oct 1999 18:56:08 +0200


unfortunately... I just listen some mp3 songs played
through es1938 from alsa 0.4.1b on 2.3.20. I decided
to start some real work, so I typed

umount /asd

where /asd was dead ncpfs filesystem (server died...).
Unfortunately, music start repeating last second (64KB?)
and after few seconds I got:
NMI Watchdog detected lockup on CPU1, registers:
EIP: 0010:[<d288da71>]
EFLAGS: 00000002
EAX: 1 EBX: cfad4244 ECX: 2 EDX: 0
ESI: 86 EDI: cfcafc04 EBP: 0 ESP: c8fafde4 (all hexa)
DS: 18 ES: 18 SS: 18
Process umount (pid: 18965, stackpage = c8faf000)
Stack: 10101 0 0 30002 1 d288b627 cfad4244 0 1 d29220b4 cfad4244
cfcafc04 c08f8000 d29221e8 cfad4244
CallTrace: c0157c22 : kfree_skbmem + x (after call to kmem_cache_free)
c0115757 : timer_bh + 291 / 951 (just after return from __global_sti)
c010afa5 : handle_IRQ_event + 81 (return from call intproc)
c010b1bb : do_IRQ (just after return from handle_IRQ_event)
c0109a2c : ret_from_intr + 0
c013bed1 : invalidate_list + 117 / 139
c013bf1c : invalidate_inodes + 52 / 111 (just after call
to invalidate_list(inode_in_use))
c013174d : do_umount + 225 / 283 (call to ^)
c0131828 : umount_dev + 160 / 291 (call to ^)
c0131975 : sys_umount + 201 / 263 (call to ^)
c01319c0 : sys_oldumount
c0109990 : system_call + 52
Code: f6 83 bc 00 00 00 00 01 75 f7 e9 c9 da ff ff f6 86 bc 00 00 00
console shuts up...

I searched my module binaries and I found only one sequence of bytes
in 'code'. It was in snd-es1938.o:
testb $0x1,0xbc(%ebx)
jne [line_above]
jmp <snd_solo_trigger+20>

So just to let you know, I'll bother alsa developers after I find some way
how to send bugreport through email and not through WWW interface.
But I do not understand how EIP matches with stack trace. Or is it
EIP from CPU1 and stack trace from CPU0? (*)
I do not know, what was CPU0 doing when this happenned. Is it possible
to get other CPU state through IPI or detect that another CPU NMI-oopsed,
so do NMI-oops too (at worst at next tick...).
Best regards,
Petr Vandrovec
vandrove@vc.cvut.cz

(*) I asked these question already about month ago, but no-one replied.
But my oopses EIP do not match stack since I have SMP board.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/