Re: mysterious 2.0.33 crashes

Doug Ledford (dledford@dialnet.net)
Thu, 19 Feb 1998 07:28:48 -0600


Alfredo Sanjuan wrote:
>
> I've recompiled my kernel disabling CONFIG_PCI_OPTIMIZE and CONFIG_SKB_LARGE.
> Until now I have an uptime of 1 day, 18:05 (cross my fingers), with no Oopses
> at all. Everything is working fine under a high network presure. If anybody want
> to have a look at my .config I can send you.
>
> >> If it's in masquerading or IP-multicasting, I'll find out soon since I
> disabled
> >> these features yesterday because I don't really need them yet.
>
> /alfredo

OK..first off, thanks to everyone who sent me their .config files. I
received six different config files from machines that were crashing and one
config file from Jon Lewis on a machine that was not crashing for
comparison. Amongst the machines that were crashing, roughly 66% had
PCI_OPTIMIZE enabled. Roughly 66% had TRITON support enabled. The one
working config (from Jon Lewis) had IP_MASQUERADING enabled and in use.
That's not to completely exonerate the masquerading code, as the huge loop
that was noted by JuanJo Ciarlante could still have an impact depending on
processor speed and how other devices react to large amounts of time spent
with interrupts off. The one overriding factor in the failing .config
files, and that happened to be different on Jon Lewis' config file, was the
presence of CONFIG_SKB_LARGE. At this point, based on what I have in the
various .config files, I would be prepared to make the following
suggestions:

1. Disable PCI Bus Optimization (CONFIG_PCI_OPTIMIZE). We already know
that in at leat two cases this single option has made a difference,
presumeably because some PCI chipsets are a little flaky and don't handle
the optimizations properly.

2. Disable the CONFIG_SKB_LARGE option. This is too uniformly established
in the problem machines not to be suspect.

3. IF you don't use it, then disable IP_MULTICAST support (although the
kernel code may be fine for multicast support, individual drivers vary in
how they handle this option, including some cards just go strictly into
promisc. mode because they don't have hardware multicast filters, other
drivers have network problems when using multicast, etc, although this
should only be an issue if some piece of software is actually trying to
send/receive multicast packets).

4. IP_MASQUERADING....well, this has been in there enough to be suspect,
but we also know that it can work properly. Any problems with this code
very well may be load/CPU speed/SKB_LARGE combinational dependant. If you
don't need it, disable it. If you do need it, make sure you've done steps
1,2, and 3 to try and make sure this is reliable.

Anyway, that's the starting point I would recommend. I would be interested
to know if these things actually solve some of these problems (and I know
several people are already reporting that so far the lack of SKB_LARGE and
MASQUERADING *appear* to have helped, but then again, it's too early to
really tell).

FWIW, in the beginning of all of this, I noted that one reason I wanted to
look at people's .config files is because I never have these problems, so I
would compare these config to my own working configs as well. I *never* use
items 1, 2, and 4, and I only use 3 on tulip based cards which have proper
(actually, very good) support for IP_MULTICAST packets. Item 1 didn't like
my particular PCI chipset when I last tried it, item 2 caused problems on
some of our servers when packet loss was an issue and disabling it sped up
the restart times on retransmit windows, at which point customers said to me
"Wow, the news server is much faster now". Item 4 I've never needed. Item
3 is only used by gated and only on tulip based cards. With that
configuration, I never see the TCP oopses or the lockups other people are
seeing. For reference, I'm including the output of procinfo and a .config
frmo one of my machines with the highest network and disk load of all of my
machines. This is our local news server, and also our ftp server
(ftp.dialnet.net) and the low uptime is because I do repeated kernel
upgrades when I'm working on the aic7xxx driver like I have been recently.

[dledford@news boot_disks]$ procinfo
Linux 2.0.33 (root@news) (gcc 2.7.2.1) #6 Tue Feb 10 18:33:31 CST 1998
[news]

Memory: Total Used Free Shared Buffers
Cached
Mem: 127948 126828 1120 56128 34496
60076
Swap: 258744 3680 255064

Bootup: Thu Feb 12 05:06:19 1998 Load average: 1.69 1.90 1.87 4/41 29025

user : 10:29:11.65 6.2% page in : 85748310 disk 1: 6071675r
3090592w
nice : 0:00:00.00 0.0% page out: 95473245 disk 2: 12160023r
6006981w
system: 1d 9:59:29.13 20.0% swap in : 118399 disk 3: 12545493r
5490147w
idle : 5d 5:44:29.74 73.9% swap out: 169896 disk 4: 12342970r
5759021w
uptime: 7d 2:13:10.50 context : 227545570

irq 0: 61279052 timer irq 8: 0 +
rtc
irq 1: 6 keyboard irq 9:
0
irq 2: 0 cascade irq 10: 45114658
aic7xxx
irq 3: 0 irq 11: 53814834
aic7xxx
irq 4: 0 irq 12:
0
irq 5: 0 irq 13: 1 math
error
irq 6: 2 irq 14:
0
irq 7: 0 irq 15: 304845146 DS21140
Tulip

[dledford@news boot_disks]$ cat /usr/src/linux/.config | grep -v
"^#"

CONFIG_EXPERIMENTAL=y

CONFIG_MODULES=y
CONFIG_MODVERSIONS=y

CONFIG_NET=y
CONFIG_PCI=y
CONFIG_SYSVIPC=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_KERNEL_ELF=y
CONFIG_M586=y

CONFIG_BLK_DEV_FD=y

CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_STRIPED=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_INITRD=y

CONFIG_FIREWALL=y
CONFIG_NET_ALIAS=y
CONFIG_INET=y
CONFIG_IP_FORWARD=y
CONFIG_IP_MULTICAST=y
CONFIG_RST_COOKIES=y
CONFIG_IP_FIREWALL=y
CONFIG_IP_ACCT=y
CONFIG_IP_MROUTE=y
CONFIG_IP_ALIAS=y

CONFIG_IP_NOSR=y

CONFIG_IPX=m
CONFIG_ATALK=m

CONFIG_SCSI=y

CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m

CONFIG_SCSI_AIC7XXX=y
CONFIG_OVERRIDE_CMDS=y
CONFIG_AIC7XXX_CMDS_PER_LUN=28
CONFIG_AIC7XXX_PROC_STATS=y
CONFIG_AIC7XXX_RESET_DELAY=8
CONFIG_SCSI_BUSLOGIC=m

CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_EQUALIZER=m
CONFIG_PPP=m

CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_EL3=m
CONFIG_NET_ISA=y
CONFIG_NE2000=m
CONFIG_NET_EISA=y
CONFIG_DEC_ELCP=y

CONFIG_QUOTA=y
CONFIG_EXT2_FS=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_PROC_FS=y
CONFIG_NFS_FS=m
CONFIG_SMB_FS=m
CONFIG_SMB_WIN95=y
CONFIG_NCP_FS=m
CONFIG_ISO9660_FS=m

CONFIG_SERIAL=y
CONFIG_PRINTER=m
CONFIG_RTC=y

-- 
 Doug Ledford  <dledford@dialnet.net>
  Opinions expressed are my own, but
     they should be everybody's.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu