UMSDOS oops:kernel v2.2.13 null ptr boot problem with ksymoops.

M Sweger (mikesw@whiterose.net)
Fri, 10 Dec 1999 10:04:27 -0500 (EST)


Hi,
I've provided a ksymoops trace for UMSDOS linux v2.2.13 using libc6 v2.0.7pre6
and gcc v2.95.2. Here are my setups in items (A-E).

QUESTION: For what kernel version is UMSDOS hard link rename() function
fixed in?

A). Hardware
Dell 333mhz optiplex GX1
adaptec 2940U2W AIC 7985
WDE 9 gig SCSI
QUANTUM 1.2 gig IDE
3com 3c59x ether card.

B). USMDOS OS:
stable linux v2.2.1 trying to run v2.2.13
all s/w pkg about the latest i.e. Mtools v3.9.4, util-linuxi,
binutils 2.9.1.0.24.tar.gz
bin86_0.13.0-4.deb
precompiled libc6-dev_2.0.7t-1.deb

modules insmod'd: loop.o 3c59x.o
daemons running: syslogd, klogd,inetd ---nothing else
kernel configuration: minimal with i586 option selected,
with MTRR not enabled.
with SMP support not enabled.
with no floating pt emulation.
with FAT,UMSDOS and VFAT filesys.

****** Listing of rc.M that I use ******

# Start system log daemon: syslogd

if [ -x /usr/sbin/syslogd ]; then
echo "Starting system log daemon: syslogd"
/usr/sbin/syslogd -m 0
fi

# Start kernel log daemon: klogd

if [ -x /usr/sbin/klogd ]; then
echo "Starting kernel log daemon: klogd"

# NOTE: I have to do all this since the stack trace doesn't write to a file and
, the /proc/ksyms aren't available due to the crash on boot.
The /proc/modules are dumped too. Therefore, to get a stack trace,
I need to force a kernel buffer flush or else I won't have any data
after the crash. The klogd outputting to kern.oops isn't detailed
enough.

Once the kernel crashes, the MSDOS filesystem must be repaired.
Then when I reboot to a stable v2.2.1 kernel, it copies the previous
data to a safe area for review whils't the new data is written out
for the next reboot.

cp -p /var/log/kern.oops /root/kern.oops.$$.`date +%s`
cat /proc/modules >/root/kmodules.`date +%s`
cat /proc/ksyms >/root/ksyms.`date +%s`
# force kernel flush of messages
sync
/usr/sbin/klogd -f /var/log/kern.oops -k /usr/src/linux/System.map -c 1
fi

C). booted the linux v2.2.13
NOTE: I had to hand type this data into a file to run through ksymoops.
To do this, I'll be running kernel v2.2.1 which is stable; however,
all the data fed to ksymoops is the previous kernel v2.2.13 crash
data.

****** The shell script I run on the data ******
#!/bin/sh

ksymoops -V -l /root/22da/kmodules.22d -k /root/22da/ksyms.22d -o /lib/modules/2.2.13/ /root/22da/kern.ops >/root/22da/kt.1 2>&1

******** The data dump trace *********
Note: the module warning might be due to be having v2.2.1 3c59x loaded
although the ksyms were dumped by rc.M start script above for
the linux v2.2.13.

ksymoops 0.7c on i686 2.2.13. Options used
-V (specified)
-k /root/22da/ksyms.22d (specified)
-l /root/22da/kmodules.22d (specified)
-o /lib/modules/2.2.13/ (specified)
-m /usr/src/linux/System.map (default)

Warning (compare_ksyms_lsmod): module 3c59x is in lsmod but not in ksyms, probably no symbols exported
Oops: 0
CPU: 0
EIP: 0010:[<c012e592>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010007
eax: 00000000 ebx: c3cab010 ecx: c7dc8098 edx: 00000000
esi: c3cab00c edi: c3cab000 ebp: 00000287 esp: c7ffbeec
ds: 0018 es: 0018 ss: 0018
process init (pid:1, process nr:1, stackpage=c7ffb000)
Stack: 00000400 0000000b 00000004 c7ef0018 00000000 c012e8ff c3b47000 00000000
0000000b c012e8d0 00000001 00000004 c7ef0018 00000000 c7fbd860 c7fbd760
00000001 c7ffa000 c7ffa000 7fffffff 00000000 c3b470000 00000806 0000001f
call trace: [<c012e8ff>] [<c012e8d0>] [<c012ec89>] [<c010a0dd>] [<c0109fa4>]
code: 8b 42 04 39 d8 75 f7 89 4a 04 55 9d 83 c4 f4 8b 06 50 e8 27

>>EIP; c012e592 <free_wait+3a/74> <=====
Trace; c012e8ff <do_select+1fb/214>
Trace; c012e8d0 <do_select+1cc/214>
Trace; c012ec89 <sys_select+371/498>
Trace; c010a0dd <error_code+2d/40>
Trace; c0109fa4 <system_call+34/40>
Code; c012e592 <free_wait+3a/74>
00000000 <_EIP>:
Code; c012e592 <free_wait+3a/74> <=====
0: 8b 42 04 movl 0x4(%edx),%eax <=====
Code; c012e595 <free_wait+3d/74>
3: 39 d8 cmpl %ebx,%eax
Code; c012e597 <free_wait+3f/74>
5: 75 f7 jne fffffffe <_EIP+0xfffffffe> c012e590 <free_wait+38/74>
Code; c012e599 <free_wait+41/74>
7: 89 4a 04 movl %ecx,0x4(%edx)
Code; c012e59c <free_wait+44/74>
a: 55 pushl %ebp
Code; c012e59d <free_wait+45/74>
b: 9d popf
Code; c012e59e <free_wait+46/74>
c: 83 c4 f4 addl $0xfffffff4,%esp
Code; c012e5a1 <free_wait+49/74>
f: 8b 06 movl (%esi),%eax
Code; c012e5a3 <free_wait+4b/74>
11: 50 pushl %eax
Code; c012e5a4 <free_wait+4c/74>
12: e8 27 00 00 00 call 3e <_EIP+0x3e> c012e5d0 <__pollwait+4/98>

1 warning issued. Results may not be reliable.

D). Link problems.

After the kernel booted and umssync'd and the message "going multiuser"
is displayed, I get the following error message,

umsdos_readdir_x: linux/LINK2944 negative after link
umsdos_readdir_x: linux/LINK2944 negative after link

Then the daemons such as sylogd, klogd and inted are started. This is
followed then by a message [I put in] that says I'm exiting rc scripts.
After this, the crash dump appears, but before mingetty is started.

QUESTION: I don't know why this message is appearing twice. The
command being executed at this time is "ldconfig -q".
I've narrowed it down to the /lib directory by using the
-n option and doing reboots with different directories.
This isn't a problem with kernel v2.2.1. Now that I've
isolated the directory and --linux-.--- has ..LINK2944 in it,
how do I narrow down the exact file and linkage that is
causing this?

Moreover, this problem doesn't appear for linux kernel v2.2.7
although the kernel hangs with a trace dump similar to
the above trace for kernel v2.2.13.

E). kernels v2.2.1, v2.2.5, v2.2.6 run ok.
kernels v2.2.7, v2.2.8, v2.2.10, v2.2.12, v2.2.13 crash like this
all other kernels weren't tried.

IF MTRR is or is not compiled into the kernel, I'll get a stack
trace using the method above.

If SMP is compiled in, then I *don't* get a stack trace
at all although it will defaults to no SMP since I only have one processor.

The results here are due to SMP support turned off and MTTR turned on.
This problem was also present in the linux v2.2.7 kernels. I didn't
oops trace the other kernel versions.

MIke

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/