I have finally reproduced my lockup on 2.2.14 with the IKD patches.
Here are the backtraces (sans arguments) for the two CPUs as reported
by kdb.
Backtrace for CPU 1:
stext_lock + 0x5bb
__wait_on_buffer + 0xd9
sync_block + 0x9f
sync_direct + 0x22
ext2_sync_file + 0x4b
sys_fsync + 0x85
Backtrace for CPU 0:
add_timer + 0x3a
tcp_send_delayed_ack + 0x34
tcp_delack_timer + 0x3a
timer_bh + 0x37a
do_bottom_half + 0x89
do_IRQ + 0x52
common_interrupt + 0x18
do_no_page + 0x42
handle_mm_fault + 0x107
do_page_fault + 0x12d
error_code + 0x2d
memcpy_toiovec + 0x38
tcp_recvmsg + 0x377
inet_recvmsg + 0x72
sock_recvmsg + 0x37
sock_read + 0x82
sys_read + 0xc8
So one process is calling fsync() and the another is calling read() on
a TCP socket. It is not obvious to me why this is deadlocked.
When I do "go" and then hit Pause again, CPU 1 is always stuck at
exactly the same place. CPU 0 is also exactly the same except for the
most recent 5 or 6 frames; it seems like I always catch it while
handling the interrupt and attempting to send the delayed ack, which
then sets itself up to fire again a little later.
Note that "do_no_page + 0x42" is the instruction immediately following
a call to do_anonymous_page. I suspect do_anonymous_page is where I
am stuck, and the backtrace is being confused by the presence of the
interrupt. But I am not sure.
I am hoping a wizard can just look at these backtraces and see the
problem. Failing that, I would appreciate ideas for what to try next.
This crash is not easy to reproduce; this time it took almost a week
of continuously running the offending operations. The program which
elicits the crash is (unfortunately) commercial, so I do not have the
source. It runs entirely as an regular user, however, so this is
definitely a kernel bug.
I would be glad to provide any additional information (e.g., snippets
of disassembly) which would be useful.
Help, please?
- Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:30 EST