Re: 3c59x NIC overruns with multicast makes networking freeze

From: Arnaud Installe (arnaud@bach.tvd.kotnet.org)
Date: Sat Sep 23 2000 - 06:26:43 EST


The description of the NICs in question as found in
/etc/sysconfig/hwconf
are:

3Com Corporation|3c900B-TPO [Etherlink XL TPO]
3Com Corporation|3c905C-TX [Fast Etherlink]

In one machine it's the first which is getting lots of overruns; in the
other it's the second. Depends on which one is connected to the network
with 60% usage.
 The machines are dual-processor capable, but only a few of them have 2
processors inside. Yes, they're using standard IDE drives. (I've been
telling the people at my company we *need* SCSI drives forever... :-P)
 I don't think the 3c95x driver uses promiscuous mode for multicasting, or
does it ?
 BTW I'm not sure I understand Documentation/networking/multicast.txt.
(Is it still up-to-date ?) I suppose IP-MRoute, and thus AllMulti's only
relevant for routers while IP-Multicast is relevant for nodes transmitting
and receiving multicast. So that means for nodes "hardware filters really
help", so we really shouldn't be using 3c59x-drivers, right ? Am I
correct that Intel EtherExpress Pro 10/100 cards would be a better choice ?

On Sat, Sep 23, 2000 at 03:23:45PM +1100, Andrew Morton wrote:
> Arnaud Installe wrote:
> >
> > Hi,
> >
> > Has anyone else seen a lot of overruns while serving multicast on a pretty
> > loaded (60%) network, with 3c59x cards ?
>
> This is the first I've heard of it. Which sort of 3com card?
>
> > (BTW What exactly are
> > "overruns" ? Are they buffer overflows on the NIC side or buffer
> > overflows on the kernel side, because the software can't follow, or even
> > something completely different ?)
>
> The NIC has an 8 kbyte on-board FIFO, some of which (3-5k) is used for
> buffering incoming data.
>
> An Rx overrun occurs when that on-board FIFO has filled up, and more
> data needs to go into it. It shouldn't fill up. This can be caused by
> excessive PCI traffic, or by the software not being able to feed the NIC
> new Rx buffers fast enough, or by the driver leaving the Rx engine in
> the stalled state for too long.

Do I understand correctly that those Rx buffers are DMA buffers provided
by the kernel ?

> You should try the 2.2.17 driver. There's a copy at
> http://www.uow.edu.au/~andrewm/linux/3c59x-2.2.17+.gz

So it's included in linux-2.2.17.tar.gz, is it ? Is it also included in
any of the 2.4.0-test kernels ?

> If it still happens (and it probably will) there are some things you can
> look at:
>
> - Get a faster computer! How quick is it?

Pentium III 450MHz dual-processor. I thought there might be some locking
problem with the SMP code, so I installed a single-processor kernel.
Problem didn't go away though.

> - Change the driver by doubling the value of RX_RING_SIZE. Keep it a
> power of two. This is a bit desperate, but it will give you some more
> elasticity.

What's the RX_RING_SIZE used for ?

> - Make sure the NIC isn't sharing an interrupt with another device.
> Look at /proc/interrupts and if it's shared, have a poke in the BIOS.

Both NICs have their own IRQ.

> - It's also possible that some other device exists on your system which
> has a slow interrupt service routine, and is interrupting the 3c59x ISR
> for a long time while the Rx engine is stalled. This is a pretty
> unlikely scenario. Disabling interrupts around the call to
> boomerang_rx() would be slightly interesting.

I suppose IDE drives have slow ISRs ? Does unmasking the IRQs for hd's
work reliably with most of today's IDE drives and controllers ? The IDE
controller is an "Intel Corporation|82371AB PIIX4 IDE".

> > After a while the network freezes. I can't ping nor ssh to the machine.
> > Restarting the network scripts solves the problem.
>
> There was a bug in the driver which was fixed 6-8 weeks ago. If the
> driver's Rx path completely runs out of memory and cannot allocate a new
> buffer 16 times in a row, the receive path gets stuck and the interface
> needs to be taken down. Moving to the 2.2.17 driver should eliminate
> this possibility.

Hmmm... I suppose when this happens it's normal a lot of overruns occur ?
Upgrading the driver will make the networking stable, but I suppose that
won't prevent a lot of multicast UDP packets from getting dropped, or will
it ? (Though unmasking the hd IRQ might.)

> You should also try
>
> echo "512 1024 1536" > /proc/sys/vm/freepages
>
> to double the amount of memory which is reserved for drivers and such.
> This change put a big smile on the face of a 2.2.12 user earlier this
> week. (He had 512 megs and a reasonable, but heavy workload. I think
> we're setting this too low).

Ok.

> Please let me know the outcome.

I sure will. Thanks,

                                                        Arnaud

-- 
Arnaud Installe                                             a.installe@ieee.org

Administration: An ingenious abstraction in politics, designed to receive the kicks and cuffs due to the premier or president. -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:28 EST