Re: 2.2.0(release): oops in kmem_cache_free (??)

Dr. Werner Fink (werner@suse.de)
Wed, 27 Jan 1999 13:35:58 +0100


On Tue, Jan 26, 1999 at 10:32:35PM +0100, Andi Kleen wrote:
> In article <19990126185555.A14971@Galois.suse.de>,
> werner@suse.de (Dr. Werner Fink) writes:
> > On Tue, Jan 26, 1999 at 12:26:25PM -0500, David J. Fred wrote:
> >>
> >> Summary: This morning I found my machine had crashed during the night
> >> on the release version of 2.2.0, apparently in kmem_cache_free.
>
> > Yep, I've seen a similar oops during a high stress test on a UP system
> > (two make -j10 in different kernel trees). The only problem was that
> > the oops in the kernel message wasn't fully readable for ksysmoops
> > (bit garbage). After running fsck by hand to recover /dev/sda10 aka
> > /usr/src/ the system is up again.
>
>
> The oops is caused by this code in kmem_cache_free:
>
> #if 1
> /* FORCE A KERNEL DUMP WHEN THIS HAPPENS. SPEAK IN ALL CAPS. GET THE CALL CHAIN. */
> *(int *) 0 = 0;
> #endif
>
> There should be kmem_alloc: ... message directly in front of the oops
> in the kernel log. what does it say?

I found only a big line with ASCII zeros (coded with ^@) and at the end
of this line `3 = 01eaa000'.

The messages before are:
-------------------------------------------------------------------------
Jan 26 16:38:49 boole kernel: klogd 1.3-3, log source = /proc/kmsg started.
Jan 26 16:38:49 boole kernel: Inspecting /boot/System.map
Jan 26 16:38:49 boole kernel: Symbol table has incorrect version number.
Jan 26 16:38:49 boole kernel: Inspecting /System.map
Jan 26 16:38:49 boole kernel: Loaded 5824 symbols from /System.map.
Jan 26 16:38:49 boole kernel: Symbols match kernel version 2.2.0.
Jan 26 16:38:49 boole kernel: No module symbols loaded.
Jan 26 16:38:49 boole kernel: eth0: Advertising 01e1 on PHY 0 (0).
-------------------------------------------------------------------------

no surprising messages.

> As far as I can see the problem can only happen when a process in the
> process list carries a bogus pointer in tsk->mm. The new locking in
> array.c has the side effect of noticing it in mmput.

BTW: I found some nfs related messages:

Jan 27 13:10:10 boole kernel: __nfs_fhget: inode 50499339 busy, i_count=2, i_nlink=1
Jan 27 13:10:10 boole kernel: nfs_free_dentries: found SRC/tobuild.glibc.i386.next, d_count=0, hashed=1
Jan 27 13:10:10 boole kernel: nfs_dentry_delete: SRC/tobuild.glibc.i386.next: ino=50499339, count=2, nlink=1
[...]

...

Werner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/