Re: Longstanding networking / SMP issue? (duplextest)

From: Andi Kleen (ak@suse.de)
Date: Thu Feb 20 2003 - 02:52:46 EST


Simon Kirby <sim@netnation.com> writes:
>
> Baffled by this, I did a bunch of testing and tried to narrow down which
> servers were experiencing this problem. A sweep of some of our server
> blocks showed that only SMP boxes were having the problem; however, not
> all SMP boxes were affected.

That's probably because of the lazy ICMP socket locking used for the
ICMP socket. When an ICMP is already in process the next ICMP triggered
from a softirq (e.g. ECHO-REQUEST) is dropped
(see net/ipv4/icmp_xmit_lock_bh())

There is also a similar problem with ICMPs getting passed to an TCP socket.
When the socket is already locked it is dropped. You can check for
the later by looking at the LockDroppedIcmps counter in netstat -s

I guess it would be useful to cover the first drop case in that counter
too, try this untested patch and then check the counter in netstat -s.

--- linux-2.4.20/net/ipv4/icmp.c-o 2002-08-03 02:39:46.000000000 +0200
+++ linux-2.4.20/net/ipv4/icmp.c 2003-02-20 08:51:21.000000000 +0100
@@ -196,8 +196,10 @@
 static int icmp_xmit_lock_bh(void)
 {
         if (!spin_trylock(&icmp_socket->sk->lock.slock)) {
- if (icmp_xmit_holder == smp_processor_id())
+ if (icmp_xmit_holder == smp_processor_id()) {
+ NET_INC_STATS_BH(LockDroppedIcmps)
                         return -EAGAIN;
+ }
                 spin_lock(&icmp_socket->sk->lock.slock);
         }
         icmp_xmit_holder = smp_processor_id();

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Feb 23 2003 - 22:00:28 EST