partial oops on 2.6.6-bk9 (netconsole + tulip dmfe ethernet)

From: Aldy Hernandez
Date: Mon May 24 2004 - 07:25:44 EST


Hi folks.

I'm trying to track down a kernel lockup on my box. What looked like
a FC2 issue, seems like an actual kernel problem. My box is dying
after 2 days of uptime.

I've enabled nmi_watchdog and netconsole to track the problem. What I
get looks like a partial Oops, or something quite unusable:

Unable to handle kernel paging request at virtual address 8cffffe8
printing eip:
c017bff0
*pde = 00000000
Oops: 0000 [#1]
SMP

No stack trace. No register dump. Nothing.

I have compiled the kernel with debugging, and the offended code is
here:

(gdb) l *0xc017bff0
0xc017bff0 is in sys_poll (fs/select.c:517).
512 err = -EFAULT;
513 while(walk != NULL) {
514 struct pollfd *fds = walk->entries;
515 int j;
516
517 for (j=0; j < walk->len; j++, ufds++) {
518 if(__put_user(fds[j].revents, &ufds->revents))
519 goto out_fds;
520 }
521 walk = walk->next;

My box is a P4 with a Tulip/Davicom (dmfe.o) network card. I have
noticed that the CONFIG_NET_POLL_CONTROLLER code was recently added to
the dmfe.c driver, so I'm wondering if this particular Oops is due to
the driver, and not the actual problem I'm chasing down.

My kernel parameters are:

kernel /bzImage ro root=LABEL=/ rhgb quiet vdso=0 netconsole=6665@xxxxxxxxxxx/eth0,6666@xxxxxxxxxxxx/00:30:65:02:c4:c3 nmi_watchdog=1

I'm not very kernel savvy, so any pointers would be greatly
appreciated. Perhaps I should try configuring a serial console?

How can I be of use in helping to track down this problem? BTW, 2.4.x
worked fine with my box.

Cheers.
Aldy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/