socket problem

thoth@purplefrog.com
Tue, 04 Mar 1997 16:50:50 EST


[We'll see if I successfully subscribed to this list.]

Linux frop 2.0.25 #2 Mon Nov 18 15:32:26 EST 1996 i586

I am the author of the netpipes package (version 3.2 released, 4.0 in
development). I have run into what I think may be a kernel bug.

While developing the ssl-auth encryption/authentication wrapper I have
(intermittently) been getting a "Connection reset by peer" error when
READING from a socket. Since the other end of the connection is not
experiencing any errors, I can only present the following theory:

Some of my utilities (http://www.purplefrog.com/~thoth/netpipes/) have
buffering code in them. Notably hose when invoked with the -slave argument
copies from stdin to the socket and from the socket to stdout. When input
is exhausted on stdin, the hose program issues a shutdown() system call
which closes half of the socket, but leaves the other open. This is
necessary to prevent deadlock.

The instance of the error that I have concentrated on is when two
processes on the same machine are communicating through a TCP socket. If
one process writes a lot of data (64K) and then performs a shutdown(sock,1)
all while the receiving process is blocked, the receiving process has a
small chance ( ~ 1/5 ? ) of getting the "Connection reset by peer" error.

frop:107 $ faucet 3000 -vio sh -c "sleep 10; cat"
faucet; of netpipes version 4.0, Copyright (C) 1992-96 Robert Forsman
faucet comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute it
under the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or
(at your option) any later version.
faucet: Got connection from 127.0.0.1(localhost) port 1311

frop:32 $ dd if=/vmlinuz bs=4096 count=16 | hose localhost 3000 -v \
-slave > /tmp/shit
hose; of netpipes version 4.0, Copyright (C) 1992-96 Robert Forsman
hose comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute it
under the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or
(at your option) any later version.
hose: attempting to connect to 127.0.0.1(localhost) port 3000
16+0 records in
16+0 records out
during copyio() read(2)(0): Connection reset by peer

I think something is going wrong in the buffering of data in the kernel.

Pop Quiz:

1) Why is Bob having this problem [100%]
a) This is a kernel bug. It is/will-be fixed in version __.__
b) You are using the system calls wrong. You should use the following
procedure to notify the remote process that input is exhausted on the
file descriptor: ___________________________________________________

2) [There is no question 2]

For extra credit if you answer [a] to question 1:
Provide a patch to fix the kernel bug.

---
Bob Forsman                                   thoth@gainesville.fl.us
           http://www.gainesville.fl.us/~thoth/