Re: BUG: Slowdown on 3000 socket-machines tracked down

From: Ben Greear
Date: Mon Mar 07 2005 - 18:52:26 EST


I started trying to reproduce this, and hit a bug in either
my code or perhaps the tcp stack.

I have a control TCP socket on machine A connected to machine B.

Currently, server A is stuck spinning trying very hard to send commands to
server B. The interesting this is that netstat shows the SendQ to have
data on both machines (they are trying to send to each other on the same
socket connection), but the receive queues are empty on both machines as well:

machine A:
FC3 x86-64, kernel: 2.6.10-1.766_FC3smp, Dual opetron, 2GB RAM, SMP kernel

netstat:
tcp 0 93440 192.168.1.5:57228 192.168.1.165:4002 ESTABLISHED

Strace of this server:
socketcall(0x9, 0xffffb780) = -1 EAGAIN (Resource temporarily unavailable)
nanosleep({42949672960000000, 597879105668495392}, NULL) = 0
gettimeofday({2058282582467209, 597879105668495392}, NULL) = 0
gettimeofday({2058737849000585, 597879101513232728}, NULL) = 0
write(3, "1110237833479: iohandler.cc 383"..., 103) = 103
socketcall(0x9, 0xffffb780) = -1 EAGAIN (Resource temporarily unavailable)
.....



machine B:

2.6.11 + my patches, dual xeon, SMP kernel, 1GB RAM

netstat:
tcp 0 202940 192.168.1.165:4002 192.168.1.5:57228 ESTABLISHED

# Machine B is not trying to send so much stuff to A, so it is not busy-spinning,
# at least it won't untill it finally fills up it's 8MB user-space send buffer.

Any ideas??

Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/