Re: [2.4.31 + aic79xx] SCSI error: Infinite interrupt loop, INTSTAT= 0

From: Andreas Haumer
Date: Sat Nov 26 2005 - 10:44:27 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Yesterday the new LSI Logic hostadapters arrived. I replaced the
Adaptec 29320 controllers with the new LSI boards and did various
tests.

Executive summary: with the LSI controllers everything works fine!

Willy Tarreau schrieb:
> Hi Andreas,
>
> On Thu, Nov 24, 2005 at 02:05:26PM +0100, Andreas Haumer wrote:
>
>
>>>>I've never tried an adaptec U320 yet, only a few 29160 in various servers.
>>>
>>I have tried the external RAID with the following controllers now:
>>
>>Adaptec 29160 - works fine (with the aic7xxx driver)
>>Adaptec 29320ALP - does not work (tried with two different cards)
>>Adaptec 29320A - does not work
>
>
> OK so the problem is only related to 29320 + aic79xx driver.
>
It indeed looks that way.

I just replaced the Adaptec 29320ALP controller with a LSI-22320
board and all my problems are gone. No SCSI errors, no timeouts.

Here's some information about my current setup:

[lspci]
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
01:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
02:06.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:06.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:05.0 Mass storage controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
03:08.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 10)

[lsmod]
Module Size Used by Not tainted
nfsd 67824 16 (autoclean)
nfs 74072 1 (autoclean)
lockd 45616 1 (autoclean) [nfsd nfs]
sunrpc 71072 1 (autoclean) [nfsd nfs lockd]
tg3 59308 1 (autoclean)
w83627hf 13144 0 (unused)
eeprom 3500 0 (unused)
lm85 17064 0 (unused)
i2c-isa 808 0 (unused)
i2c-amd756 3082 0 (unused)
i2c-proc 6052 0 [w83627hf eeprom lm85]
i2c-core 15364 0 [w83627hf eeprom lm85 i2c-isa i2c-amd756 i2c-proc]
processor 8528 0 (unused)
button 2572 0 (unused)
keybdev 1796 0 (unused)
mousedev 3960 0 (unused)
hid 20356 0 (unused)
input 3584 0 [keybdev mousedev hid]
usb-ohci 19048 0 (unused)
usbcore 60940 0 [hid usb-ohci]
xfs 481764 6 (autoclean)
ext2 35424 1 (autoclean)
unix 16080 14 (autoclean)
reiserfs 175312 6 (autoclean)
lvm-mod 59544 34 (autoclean)
raid1 14360 2 (autoclean)
md 57856 4 (autoclean) [raid1]
sd_mod 11144 12 (autoclean)
mptscsih 34640 2 (autoclean)
mptbase 31968 3 (autoclean) [mptscsih]
aic79xx 164220 4 (autoclean)
scsi_mod 94932 3 (autoclean) [sd_mod mptscsih aic79xx]

The aic79xx now drives the internal Maxtor SCSI disks, the
mptscsih driver is used for the external RAID subsystem.


[cat /proc/scsi/scsi]
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 05 Lun: 00
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 01
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03


[dmesg]
SCSI subsystem driver Revision: 1.00
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs

scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs

blk: queue f7ae9a18, I/O limit 4095Mb (mask 0xffffffff)
(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
(scsi1:A:5): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7ae9818, I/O limit 4095Mb (mask 0xffffffff)
scsi0:A:0:0: Tagged Queuing enabled. Depth 32
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7ae9618, I/O limit 4095Mb (mask 0xffffffff)
scsi1:A:5:0: Tagged Queuing enabled. Depth 32
Fusion MPT base driver 2.05.16
Copyright (c) 1999-2004 LSI Logic Corporation
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
mptbase: 2 MPT adapters found, 2 installed.
Fusion MPT SCSI Host driver 2.05.16
scsi2 : ioc0: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=28
scsi3 : ioc1: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=29
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7ae9018, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f723de18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi1, channel 0, id 5, lun 0
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
Attached scsi disk sdd at scsi2, channel 0, id 0, lun 1
SCSI device sda: 143666192 512-byte hdwr sectors (73557 MB)
Partition check:
/dev/scsi/host0/bus0/target0/lun0: p1 p2
SCSI device sdb: 143666192 512-byte hdwr sectors (73557 MB)
/dev/scsi/host1/bus0/target5/lun0: p1 p2
SCSI device sdc: 4096000000 512-byte hdwr sectors (2097152 MB)
/dev/scsi/host2/bus0/target0/lun0: p1
SCSI device sdd: 2734071808 512-byte hdwr sectors (1399845 MB)
/dev/scsi/host2/bus0/target0/lun1: p1


[performance test]
root@setup:~ {504} $ time dd if=/dev/zero of=/platten/gisdat/bigfile bs=1M count=20000
20000+0 records in
20000+0 records out

real 1m27.764s
user 0m0.080s
sys 0m32.500s
root@setup:~ {505} $ umount /platten/gisdat/
root@setup:~ {506} $ mount /platten/gisdat/
root@setup:~ {507} $ time dd if=/platten/gisdat/bigfile of=/dev/null bs=1M
20000+0 records in
20000+0 records out

real 2m29.681s
user 0m0.030s
sys 0m31.510s

This simple performance test gives a rough estimation of
write throughput at 229MB/s and read throughput at 134MB/s

This is not bad IMHO

(Note: I did the umount/mount in order to clear the OS cache
before executing the read throughput test)

[...]
>>
>>A patch was posted by James Bottomley on October 3rd, 2004
>>(See http://marc.theaimsgroup.com/?l=linux-scsi&m=112837144508743&w=2)
>
>
> Interesting, I've archived it. James presented it as a workaround,
> waiting for something cleaner, but I've not seen any followup (may
> be I've not searched well).
>
I haven't found anything, either.

[...]
>>I also have good experience with LSI Logic cards (Fusion MPT driver).
>>Yesterday I ordered several of them, I hope I'll get them soon so I
>>can do further tests.
>
>
> OK, it will definitely rule out bad cables and RAID array.
>
Yes, and indeed it does as everything works fine now with just the
Adaptec controller replaced with the LSI controller. So the problem
is with almost absolute certainty the Adaptec driver (I don't believe
that three out of three new Adaptec controller boards show the same
hardware defect)

[...]
>
>
> Often (in my experience), when different versions of a driver find different
> error conditions, it is caused by timing problems. Perhaps the driver has to
> respect some pauses on the bus that are not quite correctly respected. Have
> you tried to lower the speed to 160 MB/s ?
>
No, I haven't.

But I tried to reduce the TCQ queue depth from 32 (default) to 8
but this did not solve the problem (this was a suggestion by
ari@xxxxxxxxxxxxx Thanks for your mail, Ari, but your mail address
bounces so I couldn't send you a reply!)


I will have this particular hardware setup available for
testing for about two or three days (until end of next Tuesday).
If anyone wants me to try any patches for the aic79xx driver in
this timeframe I'm willing to do so if time permits.
But the system will go into production by the end of next
week and after that there is no way for me to do any further
tests with this hardware.

- - andreas

- --
Andreas Haumer | mailto:andreas@xxxxxxxxx
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDiILCxJmyeGcXPhERAnJ6AJ9368FfWJanLvzU9Wv3Ts8L5BG2NACghI9O
XYkHANb/iSHAM6Bx6+Oz5Jw=
=B0RM
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/