scsi abort 0x2002 and eth0: too much work on a dual amd 760mpx system

From: kelley eicher (carde@astro.umn.edu)
Date: Mon Feb 11 2002 - 13:02:27 EST


i'm having some problems with the 2.4.17 linux kernel on a dual athlon
system that i was hoping someone could shed some light on. there seem
to be multiple problems so i'm not quite sure which path to follow at this
point.

the scenario is that i have a system with the following hardware:

# awk '/\(/' /proc/pci
    Host bridge: PCI device 1022:700c (Advanced Micro Devices [AMD]) (rev 17).
    PCI bridge: PCI device 1022:700d (Advanced Micro Devices [AMD]) (rev 0).
    ISA bridge: Advanced Micro Devices [AMD] AMD-768 [??] ISA (rev 4).
    IDE interface: Advanced Micro Devices [AMD] AMD-768 [??] IDE (rev 4).
    Bridge: Advanced Micro Devices [AMD] AMD-768 [??] ACPI (rev 3).
    SCSI storage controller: Adaptec 7892A (rev 2).
    PCI bridge: Advanced Micro Devices [AMD] AMD-768 [??] PCI (rev 4).
    VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 133).
    Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 116).

while this machine is at work i see aic7xxx_abort errors in dmesg from time
to time during heavy i/o. interestingly, this does not crash the machine or
the devices in question but recovers with an aic7xxx_dev_reset instruction
after 1-2 minutes of abort attempts.

during the time between the apparent scsi failures i see a few error
messages in the form of 'eth0: Too much work in interrupt, status e401.'

looking at /proc/interrupts i see that indeed, the eth0 device is hard at
work.

# cat /proc/interrupts
           CPU0 CPU1
  0: 33669019 33490567 IO-APIC-edge timer
  1: 19117 19797 IO-APIC-edge keyboard
  2: 0 0 XT-PIC cascade
 10: 32635348 32632811 IO-APIC-level eth0
 11: 381721 381874 IO-APIC-level aic7xxx
 14: 236054 249997 IO-APIC-edge ide0
 15: 0 6 IO-APIC-edge ide1
NMI: 0 0
LOC: 67153036 67153815
ERR: 0
MIS: 32

i have researched the 'eth0: Too much work in interrupt, status e401.' a bit
and found that it is possible to increase the threshold for which these
errors will be printed. i did not attempt this because it does not seem that
it should be a solution to this problem but more of a crutch. i.e. bad things
still happen, you just don't see them.

another reason i refrained from making any adjustments to settings for the
driver is that i have an almost identical system in a very similar load
and role that exhibits *none* of the problems mentioned.

# awk '/\)/' /proc/pci
    Host bridge: PCI device 1022:700c (Advanced Micro Devices [AMD]) (rev 17).
    PCI bridge: PCI device 1022:700d (Advanced Micro Devices [AMD]) (rev 0).
    ISA bridge: Advanced Micro Devices [AMD] AMD-765 [Viper] ISA (rev 2).
    IDE interface: Advanced Micro Devices [AMD] AMD-765 [Viper] IDE (rev 1).
    Bridge: Advanced Micro Devices [AMD] AMD-765 [Viper] ACPI (rev 1).
    USB Controller: Advanced Micro Devices [AMD] AMD-765 [Viper] USB (rev 7).
    SCSI storage controller: Adaptec 7892A (rev 2).
    SCSI storage controller: Adaptec 7892A (#2) (rev 2).
    Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 116).
    VGA compatible controller: nVidia Corporation Riva TnT2 [NV5] (rev 17).

this second machine runs an identical 2.4.17 kernel to that of the first.

the most significant difference i see here is that the chipset is the amd
760mp rather than 760mpx which is purely a supposed improvement to the
south bridge 765->768.

so before i go tearing machines apart in hopes of debugging which piece of
hardware is the cause of this less than optimal behavior, would anyone care
to wager what the cause is?

-kelley



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 15 2002 - 21:00:40 EST