Re: Slowdown on high-load machines with 3000 sockets

From: Rik van Riel
Date: Sun Feb 27 2005 - 12:20:18 EST


On Sun, 27 Feb 2005, Christian Schmid wrote:

> The problem here is that starting with 3000 sockets, the syswrite locks
> more and more on the sockets although the sockets are non-blocking. This
> just suddenly appears at around 3000 sockets. I have raised
> min_free_kbytes to 1024000 and then it suddenly did not block anymore. I
> changed it down to 16000 again and id instantly locked again. Up to
> 1024000 and no locking. Now it starts blocking again at 4000 sockets
> even with 1024000 min_free_kbytes, slowing everything down.... what
> could this be?

Is it possible to detect when the write system call blocks?

Maybe alt-sysrq-p can be used to find out where the process
is spending its time, there may be some code path left where
the write system call blocks, even with nonblocking writes...

> Its no network-problem. I have discussed this issue with netdev-people
> for 2 weeks. No memory problem as well I suppose, its 8 gb ram with a
> 2/2 split...

It could be an interaction between the network subsystem
and the memory management subsystem, eg. the TCP stack not
allocating more than a certain amount of buffer memory and
stalling until some previously sent data has been received.

Getting backtraces of when the process is "stuck" will be
very helpful.

> This problem has been observed on a 2.6.10 kernel.

Did things work right in earlier kernels (is this a regression) ?
Or have things always worked this way ?

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/