Re: Network buffer hang was Re: [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata

From: dada1
Date: Thu Sep 11 2003 - 08:19:13 EST


> > This is not a kernel crash. But total freeze as all memory is used by
> > network buffers, in no more than 10 seconds.
>
> Ok, but then you have to diagnose this freeze. I'm not sure why you
> think it must be this prefetch thingy. If the prefetch issue was
> hit then you would just get a normal segfault, not a kernel hang.

Well, the machine is a bi-athlon, and I use prefetchnta... thats all.

>
> e.g. you could write some kind of reduced test case for it and
> post it to the netdev mailing list (netdev@xxxxxxxxxxx)

Thanks very much. I'm resending my original mail (with a small test program
attached), at the end of this one.

>
> I'm cc'ing it for you.
>
> > This application receive smalls TCP messages (about 30 bytes), but the
> > network stacks allocates 4KB buffers to store this little messages.
>
> Most drivers only allocate MTU size in their receive ring
> (normally 1.5K on ethernet). This is rounded to 2K by the memory
allocator.
>
> But most drivers support a rx_copybreak parameter. When the received
> packet is smaller than rx_copybreak it is copied to a freshly allocated
> buffer with the right size.

I'm using e1000 driver , on linux-2.6, this driver doesnt use the
rx_copybreak trick.

>
> In addition the 2.4 stack also supports garbage collection in the TCP
> receive buffers. This means even when a driver doesn't do the rx_copybreak
> trick and the receive queue of a socket fills up it will copy the data
> to fresh, right sized packets by itself.
>
> Another limit for this scenario is that the network stack has internal
> limits that supposed to avoid this. These are: each socket has a
> fixed receive buffer size and when more data arrives (including packet
> metadata and normal wastage) than the receive buffer allows then it is
> still dropped. In addition TCP has a global memory limit that also kicks
> in. And the network stack has a global queue limit that prevents
> too much data to be queued from the driver to the higher level
> parts (/proc/sys/net/core/netdev_max_backlog). Sometimes the queueing
> can also be controlled on the driver level with driver specific
> knobs.
>

cat /proc/sys/net/core/netdev_max_backlog
300

> This all can be tuned by sysctls in /proc/sys. See
Documentation/networking/
> ip-sysctl.txt for more details.
>
> Also the latest 2.6 kernel finally has a writable
/proc/sys/vm/min_free_kbytes
> again. This controls the amount of memory kept free for interrupts.
> Increase that.

Hum I didnt knew this one...

cat /proc/sys/vm/min_free_kbytes
16384

>
> > I posted a test application some days ago about this problem and got no
> > answers/feedback.
>
> Did you post it to netdev? On linux-kernel such things get often
> lost in the noise.
>
> Also I would contact the driver maintainer, it could be really a driver
> Issue.
>
> -Andi

Here is the copy of the mail I sent the Sep 1st on linux-kernel & linux-net
:

Hi all

I have an annoying problem with a network server (TCP sockets)

On some stress situation, LowMemory goes close to 0, and the whole machine
freezes.

When the sockets receive a lot of data, and the server is busy, the TCP
stack just can use too many buffers (in LowMem).

TCP stack uses "size-4096" buffers to store the datas, even if only one byte
is coming from the network.

I tried to change /proc/sys/net/ipv4/tcp_mem, without results.
# echo "1000 10000 15000" >/proc/sys/net/ipv4/tcp_mem

You can reproduce the problem with the test program attached.

# gcc -o crash crash.c
# ulimit -n 20000
# ./crash listen 8888 &
# ./crash call 127.0.0.1:8888 &

grep "size-4096 " /proc/slabinfo
size-4096 40015 40015 4096 1 1 : tunables 24 12 0 : slabdata
40015 40015 0

(thats is 160 Mo, far more than the limit given in
/proc/sys/net/ipv4/tcp_mem)

grep TCP /proc/net/sockstat
TCP: inuse 39996 orphan 0 tw 0 alloc 39997 mem 79986

What is the unit of 'mem' field ? Unless it is 2Ko, the numbers are wrong.

How may I ask the kernel NOT to use more than 'X Mo' to store TCP messages
?

Thanks

Eric Dumazet

/*
* Program to freeze a linux box, by using all the LOWMEM
* A bug on the tcp stack may be the reason
* Use at your own risk !!
*/

/* Principles :
A listener accepts incoming tcp sockets, write 40 bytes, and does nothing
with them (no reading)
A writer establish TCP sockets, sends some data (40 bytes), no more
reading/writing
*/
#include <stdio.h>
# include <sys/socket.h>
# include <netinet/tcp.h>
# include <arpa/inet.h>
# include <netdb.h>
# include <unistd.h>
# include <string.h>

/*
* Usage :
* crash listen port
* crash call IP:port
*/
void usage(int code)
{
fprintf(stderr, "Usages :\n") ;
fprintf(stderr, " crash listen port\n") ;
fprintf(stderr, " crash call IP:port\n") ;
exit(code) ;
}
const char some_data[40] = "some data.... just some data" ;

void do_listener(const char *string)
{
int port = atoi(string) ;
struct sockaddr_in host, from ;
int fdlisten ;
unsigned int total ;
socklen_t fromlen ;
memset(&host,0, sizeof(host));
host.sin_family = AF_INET;
host.sin_port = htons(port);
fdlisten = socket(AF_INET, SOCK_STREAM, 0) ;
if (bind(fdlisten, (struct sockaddr *)&host, sizeof(host)) == -1) {
perror("bind") ;
return ;
}
listen(fdlisten, 10) ;
for (total=0;;total++) {
int nfd ;
fromlen = sizeof(from) ;
nfd = accept(fdlisten, (struct sockaddr *)&from, &fromlen) ;
if (nfd == -1) break ;
write(nfd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}

void do_caller(const char *string)
{
union {
int i ;
char c[4] ;
} u ;
struct sockaddr_in dest;
int a1, a2, a3, a4, port ;
unsigned int total ;
sscanf(string, "%d.%d.%d.%d:%d", &a1, &a2, &a3, &a4, &port) ;
u.c[0] = a1 ; u.c[1] = a2 ; u.c[2] = a3 ; u.c[3] = a4 ;
for (total=0;;total++) {
int fd ;
memset(&dest, 0, sizeof(dest)) ;
dest.sin_family = AF_INET ;
dest.sin_port = htons(port) ;
dest.sin_addr.s_addr = u.i ;
fd = socket(AF_INET, SOCK_STREAM, 0) ;
if (fd == -1) break ;
if (connect(fd, (struct sockaddr *)&dest, sizeof(dest)) == -1) {
perror("connect") ;
break ;
}
write(fd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}

int main(int argc, char *argv[])
{
int listener ;
int caller ;
if (argc != 3) {
usage(1);
}
listener = !strcmp(argv[1], "listen") ;
caller = !strcmp(argv[1], "call") ;
if (listener) {
do_listener(argv[2]) ;
}
else if (caller) {
do_caller(argv[2]) ;
}
else usage(2) ;
return 0 ;
}
/********************************************************************/


/*
* Program to freeze a linux box, by using all the LOWMEM
* A bug on the tcp stack may be the reason
* Use at your own risk !!
*/

/* Principles :
A listener accepts incoming tcp sockets, write 40 bytes, and does nothing with them (no reading)
A writer establish TCP sockets, sends some data (40 bytes), no more reading/writing
*/
#include <stdio.h>
# include <sys/socket.h>
# include <netinet/tcp.h>
# include <arpa/inet.h>
# include <netdb.h>
# include <unistd.h>
# include <string.h>

/*
* Usage :
* crash listen port
* crash call IP:port
*/
void usage(int code)
{
fprintf(stderr, "Usages :\n") ;
fprintf(stderr, " crash listen port\n") ;
fprintf(stderr, " crash call IP:port\n") ;
exit(code) ;
}
const char some_data[40] = "some data.... just some data" ;

void do_listener(const char *string)
{
int port = atoi(string) ;
struct sockaddr_in host, from ;
int fdlisten ;
unsigned int total ;
socklen_t fromlen ;
memset(&host,0, sizeof(host));
host.sin_family = AF_INET;
host.sin_port = htons(port);
fdlisten = socket(AF_INET, SOCK_STREAM, 0) ;
if (bind(fdlisten, (struct sockaddr *)&host, sizeof(host)) == -1) {
perror("bind") ;
return ;
}
listen(fdlisten, 10) ;
for (total=0;;total++) {
int nfd ;
fromlen = sizeof(from) ;
nfd = accept(fdlisten, (struct sockaddr *)&from, &fromlen) ;
if (nfd == -1) break ;
write(nfd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}

void do_caller(const char *string)
{
union {
int i ;
char c[4] ;
} u ;
struct sockaddr_in dest;
int a1, a2, a3, a4, port ;
unsigned int total ;
sscanf(string, "%d.%d.%d.%d:%d", &a1, &a2, &a3, &a4, &port) ;
u.c[0] = a1 ; u.c[1] = a2 ; u.c[2] = a3 ; u.c[3] = a4 ;
for (total=0;;total++) {
int fd ;
memset(&dest, 0, sizeof(dest)) ;
dest.sin_family = AF_INET ;
dest.sin_port = htons(port) ;
dest.sin_addr.s_addr = u.i ;
fd = socket(AF_INET, SOCK_STREAM, 0) ;
if (fd == -1) break ;
if (connect(fd, (struct sockaddr *)&dest, sizeof(dest)) == -1) {
perror("connect") ;
break ;
}
write(fd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}

int main(int argc, char *argv[])
{
int listener ;
int caller ;
if (argc != 3) {
usage(1);
}
listener = !strcmp(argv[1], "listen") ;
caller = !strcmp(argv[1], "call") ;
if (listener) {
do_listener(argv[2]) ;
}
else if (caller) {
do_caller(argv[2]) ;
}
else usage(2) ;
return 0 ;
}
/********************************************************************/