Repeatable 'Aiee: scheduling in interrupt' with 2.0.3[3|4-pre2]

William K. Volkman (wkv@rmi.net)
Sun, 22 Feb 1998 15:55:54 -0700


Hello,
I've found a way to repeatably get 2.0.33 and 2.0.34-pre2 to
die with the message "Aiee: scheduling in interrupt 00124691". The
address is in the '__wait_on_buffer' (the address is slighly diffent
for 2.0.34-pre2, 00124b99, but corresponds to the same location):

00124560 T get_empty_filp
00124614 T __wait_on_buffer
001246cc t sync_buffers
0012486c T sync_dev

I was trying to reproduce the corruption that has been reported on the
linux-kernel list by:

> From: Thomas Schenk <tschenk@dejanews.com>
> Date: Thu, 29 Jan 1998 16:12:33 -0600 (CST)
> Subject: Subtle Bug found in 2.0.x kernels
>
> We have been experiencing a problem with the 2.0.x kernels in
> which TCP transfers between machines were becoming corrupted. This problem
> affected rcp, rdist, ftp and other transfers between systems and exhibited
> the following symptoms.

Interestingly enough now that I search the archives a similar
corruption was reported last June however 'm.lord' thought it might be a
IDE problem, I don't have any IDE drives on my system.

> From: QingLong <qinglong@Bolizm.ihep.su>
> Date: Wed, 11 Jun 1997 09:28:30 +0400 (MSD)
> Subject: (cached?) files corruption.

I have 4 linux systems configured in a 'beowulf'esqe fashion (not
running the beowulf software just yet) which I've been using pretty
much without incident with 2.0.30 (only problem with kernel memory
leaks) for over 6 months. After reading about the 2.0.33 lockup
problems etc. I thought I would try to check some of them out.

I setup one of my systems, clotho, as a 2.0.33 system. My testing so
far consists of running 16 'ttcp' processes as listeners on
clotho. Starting 16 'ttcp' sending processes './ttcp -t -l8192
-n6553600 -p50nn' on another system, and then starting a set of ftp
transfers on clotho, md5sum'ing the result of a 4MB file download, and
looping back and doing it again. With the 'CONFIG_SKB_LARGE' option
set to 'y' I will get the 'Aiee' message within a couple of minutes,
without it set I have run the test for more than 8 hours with only one
strange message: 'NFS silly_rename cleanup failed (err = 2)' (the
/home drive is NFS mounted from my control system).

Here is the relevant part of the .config which fails:

CONFIG_EXPERIMENTAL=y

CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KERNELD=y

CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_OPTIMIZE=y
CONFIG_SYSVIPC=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_JAVA=y
CONFIG_KERNEL_ELF=y
CONFIG_M586=y

CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_IDE=y

CONFIG_BLK_DEV_TRITON=y

CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_RAM=m

CONFIG_FIREWALL=y
CONFIG_NET_ALIAS=y
CONFIG_INET=y
CONFIG_IP_FORWARD=y
CONFIG_IP_MULTICAST=y
CONFIG_SYN_COOKIES=y
CONFIG_RST_COOKIES=y
CONFIG_IP_FIREWALL=y
CONFIG_IP_FIREWALL_VERBOSE=y
CONFIG_IP_MASQUERADE=y

CONFIG_IP_MASQUERADE_IPAUTOFW=y
CONFIG_IP_MASQUERADE_ICMP=y
CONFIG_IP_TRANSPARENT_PROXY=y
CONFIG_IP_ALWAYS_DEFRAG=y
CONFIG_IP_ACCT=y
CONFIG_IP_ROUTER=y
CONFIG_NET_IPIP=m
CONFIG_IP_ALIAS=m

CONFIG_INET_RARP=y
CONFIG_IP_NOSR=y
CONFIG_SKB_LARGE=y

CONFIG_IPX=m
CONFIG_ATALK=m

CONFIG_SCSI=y

CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_BLK_DEV_SR=y
CONFIG_CHR_DEV_SG=y

CONFIG_SCSI_CONSTANTS=y

CONFIG_SCSI_7000FASST=m
CONFIG_SCSI_AHA152X=m
CONFIG_SCSI_AHA1542=m
CONFIG_SCSI_AHA1740=m
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_TAGGED_QUEUEING=y
CONFIG_AIC7XXX_PROC_STATS=y
CONFIG_AIC7XXX_RESET_DELAY=15
CONFIG_SCSI_ADVANSYS=m
CONFIG_SCSI_IN2000=m
CONFIG_SCSI_AM53C974=m
CONFIG_SCSI_BUSLOGIC=m
CONFIG_SCSI_GENERIC_NCR5380=m
CONFIG_SCSI_G_NCR5380_PORT=y

CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_PPP=m

CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLIP_SMART=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_EISA=y
CONFIG_APRICOT=m
CONFIG_DE4X5=m
CONFIG_DEC_ELCP=m
CONFIG_DGRS=m
CONFIG_EEXPRESS_PRO100B=m

CONFIG_MINIX_FS=y
CONFIG_EXT2_FS=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_PROC_FS=y
CONFIG_NFS_FS=y
CONFIG_SMB_FS=m
CONFIG_ISO9660_FS=y

CONFIG_SERIAL=y
CONFIG_PRINTER=y
CONFIG_MOUSE=y
CONFIG_ATIXL_BUSMOUSE=m
CONFIG_BUSMOUSE=m
CONFIG_MS_BUSMOUSE=m
CONFIG_PSMOUSE=m

Motherboard: Gigabyte GA-586-ATV
CPU: Intel Pentium 200 (non MMX)
Memory: 64MB FPM SIMMS
Ethernet: Intel EtherExpress Pro 10/100 (in 100Mb mode)
SCSI: Adaptec 2940UW BIOS v1.25
scsi0: Scanning channel A for devices.
Vendor: SEAGATE Model: ST19171W Rev: 0024
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: PLEXTOR Model: CD-ROM PX-12TS Rev: 1.02
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 3, lun 0

Straight RedHat 4.2 install except the linux kernel 2.0.33, obtained
from ftp.kernel.org and installed.

This system has been very stable with over 100 days uptime with
kernel version 2.0.30 (last reboot was because of kernel memory
leaks). I took it down to try to reproduce some reported problems
with 2.0.33.

If anyone needs any other information let me know.

Regards,
William.

-- 
"Science has promised man power...But, as so often happens when people 
are seduced by promises of power, the price is servitude and impotence. 
Power is nothing if it is not the power to choose." Joseph Weizenbaum
MIT
What part of "Congress shall make no law..." is unclear?

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu