Re: 2.2.16 crashes on ES40 (with spinlock messages...)

From: Martin Frey (frey@scs.ch)
Date: Mon Jul 24 2000 - 16:12:54 EST


Hi,

The ES40 is running Redhat 6.1, booting from an Intel Linux
Box with 2.2.14, with user space NFS server, that is known to
have problems. The NFS message might come from e.g. the Mail
the program is trying to send out.

The thing that kills the program is the:
> Time = map_shmem: error in shmget: Invalid argument
> test4_a : Error in szin_w-11-2.ALPHA : program crashed, status=1

Is your program doing a shmget?
shmget() itself works on the system, we use it in one of our
applications.
Which is the program causing that error message? I can try to
run it under strace.

Best regards,

Martin Frey

> Strange. I have run the code both from NFS mounted and ext2 partitions
> and never saw such a message. I'm running RH6.2 with 2.2.16 kernel
> (with knfsd.) The NFS disks are hard mounted and used to complain
> about nfs server not responding, but since the switch upgrade that
> message disappeared from the syslog. What's your configuration?
>
> --andrew
>
> Hi,
>
> on our ES40 2.2.14, NFS mounted directory I get in syslog:
> /.nfs0000000021364c6b0000000f, error=-13
> NFS: can't silly-delete mqueue/.nfs0000000021358c8700000015, error=-13
> NFS: can't silly-delete tmp/.nfs0000000021364c6f0000001a, error=-13
> NFS: can't silly-delete tmp/.nfs0000000021364c710000001c, error=-13
> NFS: can't silly-delete tmp/.nfs0000000021364c730000001e, error=-13
> NFS: can't silly-delete mqueue/.nfs0000000021358c8900000028, error=-13
> NFS: can't silly-delete tmp/.nfs0000000021364c700000001b, error=-13
> NFS: can't silly-delete tmp/.nfs0000000021364c720000001d, error=-13
> NFS: can't silly-delete tmp/.nfs0000000021364c740000001f, error=-13
> NFS: can't silly-delete mqueue/.nfs0000000021358c8c0000002b, error=-13
>
> The test4_out.1 file says:
>
> The e-mail address is frey
> The hostname of this machine is es0.scs.ch
> The architecture of this machine is ALPHA
> The process id is 873
> The executable is szin_w-11-2.ALPHA
> The reader is read-szin-11-15
>
> Time = map_shmem: error in shmget: Invalid argument
> test4_a : Error in szin_w-11-2.ALPHA : program crashed, status=1
>
> Exaclty the same (including the kernel messages) comes when
> running on a local SCSI disk with ext2 filesystem. What's
> wrong?
>
> Andrew Pochinsky wrote:
> >
> > Hi,
> >
> > Some time ago I posted a message to this list about misterious crashes
> > on Alpha ES40. Peter Rival, Pat O'Rourke and Michal Jaegermann made
> > some interesting suggestions for a possible cause. Unfortunately, I
> > was not able to fix the problem. This time, however, our user built
> > the code which reliably crashes the system after a few second run.
> > Each crash is accompanied by a 'spinlock ... stuck' message repeated
> > for every processor in the system. Once all the processors are stuck,
> > the system goes catatonic. To check that the problem is not related to
> > some flaky hardware, I rebooted the same kernel with nosmp flag. The
> > problem is gone (of course, the machine now runs four times slower ;(
> >
> > --andrew
> >
> > P.S. The tarball of the executable could be found at
> > <ftp://ftp.lns.mit.edu/pub/avp/smp-crash.tar.gz>. Simply start runme
> > and wait.
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.rutgers.edu
> > Please read the FAQ at http://www.tux.org/lkml/
>
> --
> Supercomputing Systems AG email: frey@scs.ch
> Martin Frey www: http://www.scs.ch/~frey
> Technoparkstrasse 1 phone: +41 (0)1 445 16 00
> CH-8005 Zurich fax: +41 (0)1 445 16 10

-- 
Supercomputing System AG		email:	frey@scs.ch
Martin Frey				web:	http://www.scs.ch/~frey/
Technoparkstrasse 1			phone:	+41 (0) 1 445 16 00
CH-8005 Zurich				fax:	+41 (0) 1 445 16 10

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jul 31 2000 - 21:00:18 EST