Re: knfsd and ext2? Huh?

From: Hans Reiser (hans@reiser.to)
Date: Tue Jun 20 2000 - 08:31:22 EST


"Alexei I. Adamovich" wrote:
>
> > Sender: root@dns.centro.ru
> Hans Reiser <hans@reiser.to> wrote on Tue, 20 Jun 2000 03:38:14 -0700:
> > "Alexei I. Adamovich" wrote:
> > >
> > > Alexander Viro <viro@math.psu.edu> wrote on 19 Jun 2000 09:06:21 -0400 (EDT)
> > > > On Mon, 19 Jun 2000, Alexei I. Adamovich wrote:
> > > >
> > > > > > Jun 15 10:33:38 adam kernel: iput: Aieee, semaphore in use inode
> > > > 03:06/496185, count=-1012063976
> > > > >
> > > > > Seems, it's knfsd issue (03:06 is /dev/hda6). Am I right?
> > > >
> > > > Seems it's a severe memory corruption. Notice the "count" part.
> > > Yes, it's possible. Since the problem certainly isn't the hardware
> > > one, this can mean that somebody assigned something like 0xc3ad2118
> > > (==-1012063976) into the semaphore count. AFAIK it looks like some
> > > pointer to the kernel memory space--so your guess looks like bright
> > > one.
> > > But also it can be an underflow--current processors are so fast
> > > comparing to the total time the test had been running before kernel
> > > issues this message (something like an hour or a half of an hour).
> > > Anyway, I can trigger the problem only when Stress.sh is running over
> > > knfsd-served nfs tree.
> > I am confused, you say it could be bad memory but then it seems you argue to the
> > contrary?
> Thanks, Hans for pointing me. My apologies, I certainly mean software
> problem here: Russian equivalent of "memory"+"corruption" can mean not
> only bad memory, but also bad usage of memory by assigning non-appropriate
> values to the memory locations.
>
> The reason why I think it isn't the hardware but certainly software
> problem is that these kernel-related problems are exposed on my
> box--on several kernel versions--only when stress-testing local
> knfs-exported nfs-mounted partition.

Ok, first, I think that we should just tell them that stress.sh is on our
website and we got it form some other website, so I don't think the copyright is
realistically at issue unless one of us tries to sell it or some such.

Second, if they try and fail to replicate it on their hardware, then we should
try different hardware, since we have found that heat sensitivity bugs that only
occur for particular software do occur in real life. From your description
though this sounds extremely unlikely, so let's assume it is software until they
fail to replicate it.

Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:17 EST