Re: 2.4.0-test5 bug: invalid "shmid_kernel" passed to "shm_nopage_core"

From: Kevin Buhr (buhr@stat.wisc.edu)
Date: Sun Nov 26 2000 - 02:05:56 EST


Christoph Rohland <cr@sap.com> writes:
>
> This is the first report of such corruption. If it's real it is _not_
> fixed between test5 and test11. There is probably no way to reproduce
> it since you ask if it's fixed in test11, right?

I know no way to reproduce it. I've been using "test5" reliably since
just after its release, and I've triggered this bug only the one time.

I was running Mozilla, one of the few programs I run that uses shared
memory to communicate with the X server. If I recall correctly, the
machine had been idle for a few minutes when my ISP suddenly hung up
on me. Then, I discovered the machine had locked: CPU1 running "pppd"
got stuck waiting for the kernel lock in "sock_ioctl". I believe it
was the innocent victim. CPU0 (running "XF86_SVGA") had grabbed the
kernel lock and gotten stuck spinning on the invalid swap device
spinlock, as mentioned in my previous message.

I use a SysReq patch to do an oops-style dump instead of the usual
"showPc" function, so I was able to copy a stack dump down.

>From the stack dump, I can be 100% positive that, in shm_nopage_core,
"shp" was 0xc218b240 on entry and "idx" was 0, but the line

        pte = SHM_ENTRY(shp,idx);

calculated a value of 0xc218b268, the memory location of
"shp->shm_dir". That is, I had shp->shm_dir == **shp->shm_dir, so I
*suspect* that that shp->shm_dir == *shp->shm_dir.

In any event, the "shp" was corrupt (hadn't been initialized or had
been freed and reused).

I'll fiddle around a bit more and see if I can find a way to reproduce
it reliably.

Thanks.

Kevin <buhr@stat.wisc.edu>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Nov 30 2000 - 21:00:15 EST