very strange issue with sata,<4G Ram, and ext3

From: Rick Warner
Date: Thu Apr 28 2005 - 11:17:57 EST


Hello,
We are having a very strange issue on some 64bit systems. We have a 32 node
cluster of EM64T's (supermicro boards). We are using our node restore
software to propagate a linux install onto them. We do a pxe boot to a
kernel and initrd image. The initrd has some config info, a basic root
filesystem, and a restore script. The kernel is passed init=/restore (the
restore script itself). The script runs dhcp, gets an ip, then nfs mounts
the master node of the cluster. The backup image is stored on the master
node's nfs mount. The script then applies a backed up partition table and
then mkfs's the partitions, mounts them, untars a backup tar to the drive,
and then makes it bootable with grub.

On these systems, we are getting ext2 errors from the initrd during the
untarring. Soon after, we start getting seg faults on random things (looks
like stuff caused by the still running dhcp client), and then a continuous
stream of segfaults on the restore script itself (restore[1]).

The systems being restored are dual em64t's with 2G of ram and 200G sata
drives. If we up the memory to 4G, the restores complete without error. If
we reduce down to 512M, the segfaults start at the mkfs stage instead of the
untar stage. We've tried different sata drives and controllers without
change. Switching to ide drives works. Switching to reiserfs instead of
ext3 for the destination drives works too. We've tried enabling the scsi
debug stuff as well as the jbd debug stuff for ext3 without getting any more
info. We also enabled the kernel debug options too. We've also tried using
the deprecated ide based sata drivers instead of the scsi based ones without
success. We have tried restoring to Intel's Jarell EM64T systems as well as
an Arima HDAMA opteron with the same errors. We've also tried adding swap
space ASAP in the inird image.

This problem is really baffling us and we're not quite sure what to check
into next. Any ideas?


--
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/