Re: [URGENT ASSISTANCE REQUESTED] production machines dying

Rik van Riel (H.H.vanRiel@fys.ruu.nl)
Mon, 24 Nov 1997 21:52:32 +0100 (MET)


On Mon, 24 Nov 1997, Thomas Schenk wrote:

> I need any assistance I can get on the following problem:
>
> Some of our machines are getting the following error message on the
> console and are locking up tight.
>
> Kernel panic: skput : over: nnnnnnnn:nnn
> In swapper task - not syncing
>
> I have checked both machines and can find no other error messages as
> to why it is crashing. I can't recall the exact value of nnnnnnnnn:nnn.

The value of nnnnnnnn:nnn can be used (with /boot/System.map)
to look up which kernel function is hanging... If we know
which one it is, we can (and will) gladly fix it for you.

> One machine hardware config is as follows:
>
> Dual Pentium Pro 200MHz
> 256M RAM
> Buslogic Model BT-958 PCI Wide Ultra SCSI Host Adapter
> 5 SEAGATE ST34572W (4 Gig) drives

Sounds great... What type of mainboard are you using??

> 3COM 3c509 Ethernet Card
> 3COM 3c905 Boomerang Ethernet Card

I heard rumours that the Boomerang exhibits some problems
that keep coming back...

> CONFIG_BLK_DEV_CMD640=y
> # CONFIG_BLK_DEV_CMD640_ENHANCED is not set
> CONFIG_BLK_DEV_RZ1000=y
> CONFIG_BLK_DEV_TRITON=y
> # CONFIG_IDE_CHIPSETS is not set

You only need to select the chipset you actually have on your
board... If you have triton the rz1000 and cmd640 might slow
you down quite a lot.

> CONFIG_IP_ALWAYS_DEFRAG=y
> CONFIG_IP_ACCT=y

Linux has some problems with memory fragmentation, and using
the CONFIG_IP_ALWAYS_DEFRAG option might get the kernel
hanging in some network-memory-allocation loop...

There are (most likely) some patches floating around to
fix this. I have some 'fixes' for this for the 2.1 kernels.
If 2.0 doesn't work for you, you might as well try 2.1.42
(worked rock-solid for me) or 2.1.6[345] with and without
my patch applied.
Or try going back to kernel 2.0.29, 2.0.30, 2.0.18 or
whatever kernels have shipped with the major distributions.

> CONFIG_NO_PATH_MTU_DISCOVERY=y

??? any 2.0 config guru can comment on this ???

> CONFIG_SCSI_AIC7XXX=y
> CONFIG_AIC7XXX_TAGGED_QUEUEING=y
> # CONFIG_OVERRIDE_CMDS is not set
> # CONFIG_AIC7XXX_PAGE_ENABLE is not set
> CONFIG_AIC7XXX_PROC_STATS=y
> CONFIG_AIC7XXX_RESET_DELAY=15

Why compile in this driver when you have Buslogic controllers?
Not that it hurts, but it's just a waste of memory...

Success,

Rik.

----------
Send Linux memory-management wishes to me: I'm currently looking
for something to hack...