poll(2) semantics changed in 2.4.0-? vs. 2.2.16?

From: Martin Diehl (mdiehlcs@compuserve.de)
Date: Thu Oct 05 2000 - 08:31:49 EST


Hi,
had some long network stalls during initscripts in 2.4.0-test9.
newaliases appeared to block until timeout in do_poll() on udp socket.
I've written a demo program to show what's going on without depending
on initscript and config (DNS/RPC/NIS) issues - attached polltest.c
It sends something to an *unbound* udp port on localhost and polls for
reply. In 2.2.16 it returned immediately with ECONNREFUSED while for
2.4.0-t9 it blocks until timout, despite the ICMP port unreachable packet
returned in both cases - tcpdump -i lo gives:

14:05:13.591422 localhost.localdomain.32768 > localhost.localdomain.12345:
   udp 8 (DF)
14:05:13.591422 localhost.localdomain.32768 > localhost.localdomain.12345:
   udp 8 (DF)
14:05:13.591527 localhost.localdomain > localhost.localdomain: icmp:
   localhost.localdomain udp port 12345 unreachable (DF) [tos 0xc0]
14:05:13.591527 localhost.localdomain > localhost.localdomain: icmp:
   localhost.localdomain udp port 12345 unreachable (DF) [tos 0xc0]

(i believe everything appears twice because tcpdump sees both ends of lo)
So my impression is, the returned ICMP packet is disregarded somehow.
I also include a diff of the straces for polltest on 2.2.16 vs.
2.4.0-t9 (trivial things like getpid, gettimeofday, fstat64 removed):

--- polltrace-2.2.16 Thu Oct 5 13:57:13 2000
+++ polltrace-2.4.0-t9 Thu Oct 5 14:05:23 2000
 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
 bind(3, {sin_family=AF_INET, sin_port=htons(0),
    sin_addr=inet_addr("0.0.0.0")}}, 16) = 0
 sendto(3, "foo bar\0", 8, 0, {sin_family=AF_INET, sin_port=htons(12345),
    sin_addr=inet_addr("127.0.0.1")}}, 16) = 8
-poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 10000) = 1
-recvfrom(3, 0xbffffcbc, 100, 0, 0xbffffd24, 0xbffffcb8) = -1 ECONNREFUSED
    (Connection refused)
+poll([{fd=3, events=POLLIN}], 1, 10000) = 0
-write(1, "recvfrom: errno=111 - Connection"..., 41) = 41
+write(1, "poll - timeout\n", 15) = 15

To reproduce the problem just start polltest on 2.2.16 (or similar ?) and
recent 2.4 to compare the results. It accepts an optional command line
argument which is the remote port to poll on. This should be an unbound
port and defaults to 12345 if not given.

Is this an intentional change? Is it required by SuS e.g.?
Am I completely wrong when believing this might brake a number of
network programs (traceroute e.g.)?

Haven't tried neither non-loopback connection nor other
tcp/udp/poll/select combinations. But sane semantics should be consistent
IMHO. The testbox was UP in case that matters (lost events at scheduling
points???)

What am I missing?

Regards
Martin



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:16 EST