Stuck TCP sockets in 2.1.1xx SMP

Alex Korobka (korobka@galaxy.ams.sunysb.edu)
Mon, 19 Oct 1998 14:41:55 -0400 (EDT)


We have a few dual PII 400 machines (P6DBE boards, eepro100 NICs)
that we'd like to use in a Beowulf-like cluster. However, all recent
kernels have exhibited the same problem, NPB2.3 MPI benchmarks keep
getting stuck waiting for incoming data. This happens only when
there are 2 MPI processes running on the same machine, there are
no problems with one process per machine. This is the output
of netstat -a -t for a job consisting of 8 MPI processes running
on star1, star2, star3, and star4 nodes.

star3> netstat -a -t
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:sunrpc *:* LISTEN
tcp 0 0 *:ftp *:* LISTEN
tcp 0 0 *:telnet *:* LISTEN
tcp 0 0 *:gopher *:* LISTEN
tcp 0 0 *:shell *:* LISTEN
tcp 0 0 *:login *:* LISTEN
tcp 0 0 *:pop-2 *:* LISTEN
tcp 0 0 *:pop *:* LISTEN
tcp 0 0 *:imap *:* LISTEN
tcp 0 0 *:finger *:* LISTEN
tcp 0 0 *:time *:* LISTEN
tcp 0 0 *:auth *:* LISTEN
tcp 0 0 *:857 *:* LISTEN
tcp 0 0 *:smtp *:* LISTEN
tcp 0 0 *:10025 *:* LISTEN
tcp 0 3 star3.messier:login starzero.messier:1013 ESTABLISHED
tcp 0 0 star3.messier:shell star1.messier:1019 ESTABLISHED
tcp 0 0 star3.messier:1023 star1.messier:1018 ESTABLISHED
tcp 0 0 star3.messier:1147 star1.messier:1110 ESTABLISHED
tcp 0 0 *:1148 *:* LISTEN
tcp 0 0 star3.messier:shell star1.messier:1008 ESTABLISHED
tcp 0 0 star3.messier:1022 star1.messier:1005 ESTABLISHED
tcp 0 0 star3.messier:1149 star1.messier:1110 ESTABLISHED
tcp 0 0 *:1150 *:* LISTEN
tcp 0 0 star3.messier:1152 star2.messier:1130 ESTABLISHED
tcp 0 0 star3.messier:1155 star3.messier:1154 ESTABLISHED
tcp 0 0 star3.messier:1154 star3.messier:1155 ESTABLISHED
tcp 0 0 star3.messier:1150 star1.messier:1121 TIME_WAIT
tcp 0 0 star3.messier:1156 star1.messier:1122 ESTABLISHED
tcp 0 0 star3.messier:1148 star4.messier:1134 TIME_WAIT
tcp 0 0 star3.messier:1150 star4.messier:1136 TIME_WAIT
tcp 0 0 star3.messier:1159 star4.messier:1138 ESTABLISHED
tcp 0 7796 star3.messier:1160 star4.messier:1139 ESTABLISHED
tcp 0 0 star3.messier:1148 star1.messier:1125 TIME_WAIT
tcp 0 0 star3.messier:1162 star1.messier:1127 ESTABLISHED

Recv-Q on star4 for "star3.messier:1160 star4.messier:1139" is empty. It
stays this way until the connection times out. This is the tail of the tcpdump
log for this connection. All traffic stops after several spurious duplicated acks.

...
13:49:31.298117 star4.messier.1139 > star3.messier.1160: . 1848225:1849673(1448) ack 1851121 win 7240 <nop,nop,timestamp 56303 57114> (DF) [tos 0x18] (ttl 64, id 16102)
13:49:31.298173 star3.messier.1160 > star4.messier.1139: . ack 1849673 win 14480 <nop,nop,timestamp 57115 56303> (DF) [tos 0x18] (ttl 64, id 50303)
13:49:31.298125 star4.messier.1139 > star3.messier.1160: . ack 1854017 win 5792 <nop,nop,timestamp 56303 57114> (DF) [tos 0x18] (ttl 64, id 16103)
13:49:31.298367 star4.messier.1139 > star3.messier.1160: . 1849673:1851121(1448) ack 1855465 win 4344 <nop,nop,timestamp 56303 57115> (DF) [tos 0x18] (ttl 64, id 16107)
13:49:31.299299 star4.messier.1139 > star3.messier.1160: . 1851121:1852569(1448) ack 1855465 win 15928 <nop,nop,timestamp 56303 57115> (DF) [tos 0x18] (ttl 64, id 16112)
13:49:31.299516 star4.messier.1139 > star3.messier.1160: . 1852569:1854017(1448) ack 1855465 win 15928 <nop,nop,timestamp 56303 57115> (DF) [tos 0x18] (ttl 64, id 16113)
13:49:31.299573 star3.messier.1160 > star4.messier.1139: . ack 1854017 win 14480 <nop,nop,timestamp 57115 56303> (DF) [tos 0x18] (ttl 64, id 50314)
13:49:31.299784 star4.messier.1139 > star3.messier.1160: . 1854017:1855465(1448) ack 1855465 win 15928 <nop,nop,timestamp 56303 57115> (DF) [tos 0x18] (ttl 64, id 16114)
13:49:31.300049 star3.messier.1160 > star4.messier.1139: . ack 1856913 win 14480 <nop,nop,timestamp 57115 56303> (DF) [tos 0x18] (ttl 64, id 50316)
13:49:31.300276 star4.messier.1139 > star3.messier.1160: . 1856913:1858361(1448) ack 1855465 win 15928 <nop,nop,timestamp 56303 57115> (DF) [tos 0x18] (ttl 64, id 16117)
13:49:31.300588 star3.messier.1160 > star4.messier.1139: . ack 1859809 win 14480 <nop,nop,timestamp 57115 56303> (DF) [tos 0x18] (ttl 64, id 50318)
13:49:31.301098 star3.messier.1160 > star4.messier.1139: . ack 1862705 win 14480 <nop,nop,timestamp 57115 56303> (DF) [tos 0x18] (ttl 64, id 50320)
13:49:31.301089 star4.messier.1139 > star3.messier.1160: P 1862705:1863193(488) ack 1855465 win 15928 <nop,nop,timestamp 56303 57115> (DF) [tos 0x18] (ttl 64, id 16124)
13:49:31.301839 star4.messier.1139 > star3.messier.1160: . ack 1855465 win 15928 <nop,nop,timestamp 56303 57115,nop,nop,[|tcp]> (DF) [tos 0x18] (ttl 64, id 16130)
13:49:31.301949 star4.messier.1139 > star3.messier.1160: . ack 1855465 win 15928 <nop,nop,timestamp 56303 57115,nop,nop,[|tcp]> (DF) [tos 0x18] (ttl 64, id 16131)
13:49:31.302227 star4.messier.1139 > star3.messier.1160: . ack 1855465 win 15928 <nop,nop,timestamp 56303 57115,nop,nop,[|tcp]> (DF) [tos 0x18] (ttl 64, id 16133)
13:49:31.302517 star4.messier.1139 > star3.messier.1160: . ack 1855465 win 15928 <nop,nop,timestamp 56303 57115,nop,nop,[|tcp]> (DF) [tos 0x18] (ttl 64, id 16134)
13:49:31.302522 star4.messier.1139 > star3.messier.1160: . ack 1855465 win 15928 <nop,nop,timestamp 56303 57115,nop,nop,[|tcp]> (DF) [tos 0x18] (ttl 64, id 16136)
13:49:31.309051 star3.messier.1160 > star4.messier.1139: P 1863193:1863225(32) ack 1863193 win 15928 <nop,nop,timestamp 57116 56303> (DF) [tos 0x18] (ttl 64, id 50382)
13:49:31.309072 star3.messier.1160 > star4.messier.1139: P 1863225:1863261(36) ack 1863193 win 15928 <nop,nop,timestamp 57116 56303> (DF) [tos 0x18] (ttl 64, id 50383)
13:49:31.309248 star4.messier.1139 > star3.messier.1160: . ack 1855465 win 15928 <nop,nop,timestamp 56304 57116,nop,nop,[|tcp]> (DF) [tos 0x18] (ttl 64, id 16149)
13:49:31.309306 star4.messier.1139 > star3.messier.1160: . ack 1855465 win 15928 <nop,nop,timestamp 56304 57116,nop,nop,[|tcp]> (DF) [tos 0x18] (ttl 64, id 16150)
13:49:31.493869 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 57135 56304> (DF) [tos 0x18] (ttl 64, id 50388)
13:49:31.893864 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 57175 56304> (DF) [tos 0x18] (ttl 64, id 50389)
13:49:32.693868 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 57255 56304> (DF) [tos 0x18] (ttl 64, id 50410)
13:49:34.293868 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 57415 56304> (DF) [tos 0x18] (ttl 64, id 50433)
13:49:37.493868 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 57735 56304> (DF) [tos 0x18] (ttl 64, id 50489)
13:49:43.893869 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 58375 56304> (DF) [tos 0x18] (ttl 64, id 50610)
13:49:56.693871 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 59655 56304> (DF) [tos 0x18] (ttl 64, id 50689)
13:50:22.293868 star3.messier.1160 > star4.messier.1139: . 1855465:1856913(1448) ack 1863193 win 15928 <nop,nop,timestamp 62215 56304> (DF) [tos 0x18] (ttl 64, id 50878)
...etc, until timeout

Corresponding /proc/net/tcp record look like this:
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 0501A8C0:048A 0301A8C0:0467 01 00000000:00000000 00:00000000 00000000 520 0 3122
1: 0501A8C0:0488 0601A8C0:0473 01 00001E74:00000000 01:00000CD0 00000008 520 0 3115
2: 0501A8C0:0487 0601A8C0:0472 01 00000000:00000000 00:00000000 00000000 520 0 3114
...

Also, I have one more report about the same problem with MPI on dual PII 400 systems.
The hardware is slightly different (Gigabyte Ga-6BXDS boards, Tulip 21140 NICs) but
the symptoms are the same.

Any suggestions? I can offer remote access to the cluster if someone familiar with
the networking code wants to take a closer look at this.

Alex Korobka

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/