again: access beyond end of device

Martin Schulz (schulz@iwrmm.math.uni-karlsruhe.de)
07 Dec 1999 10:55:30 +0100


"P.A.M. van Dam" <nucleus@ramoth.xs4all.nl> writes:

> On Thu, Oct 28, 1999 at 12:54:01PM +0100, Stephen C. Tweedie wrote:
> > Hi,
> >
> > On 28 Oct 1999 12:13:01 +0200, Martin Schulz
> > <schulz@iwrmm.math.uni-karlsruhe.de> said:
> >
> > > May be. My box in question runs knfsd (1.2.2-4). But the error message
> > > (=symptom) occurs on local operation. And the corruption goes much
> > > beyond that message if not fsck'ed soon. (Been there, done that.)

Well, since there were no messages about this topic on this list for
few weeks, let me tell you that it hit me again. I have no idea of the
real cause of this, but it is really nasty, so let me state again the
symptoms I know so far:

o Some kind of file system corruption causes "access beyond of device"
messages in /var/log/messages.

o These messages occur while doing backup on tape (scsi-disk to
Scsi-tape on same controller) with a simple tar command. Tar exits
with error messages such as:

tar: File home/stoll/work/claw/applications/euler/3d/shockbubble/core shrunk by 20127744 bytes, padding with zeros
tar: Error exit delayed from previous errors

That core file was about 20MB large, but I also remember it to have
been some mpeg-files or other.

o Though the error message occurs during local operation, it seems only to
affect disks exported by nfs.

o My box is a PentiumII uni-processor-system, but the effect was also reported on
SMP boxes.

o It happens with both knfsd and userland nfsd, but seems to be less frequent with
the later (I am now using Universal NFS Server 2.2beta40, on
selfcompiled a 2.2.13 kernel without any fancy things.)

o Probably not necessary to mention, the effect is completly erratic
and not reproduceble.

o It does not affect similar build nfs-client-boxes. (Well, they don't
do backup either..)

o It is absolutely necessary to do a immeadiate forced fsck on that
disk to avoid a growing loss of data.

Does anybody knows about more details of the problem? Can anybody give
me some hints where to look for some debugging output?

Yours,

-- 
Martin Schulz                             schulz@iwrmm.math.uni-karlsruhe.de
Uni Karlsruhe, Institut f. wissenschaftliches Rechnen u. math. Modellbildung
Engesser Str. 6, 76128 Karlsruhe

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/