Re: 2.0.13 Sockets Stuck on close

Christoph Lameter (clameter@fuller.edu)
Wed, 21 Aug 1996 10:47:19 -0700 (PDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Tom May: "Re: CLONE_FILES problem."
Previous message: Alan Cox: "Re: Linux-Sparc ext2 compatibility?"
In reply to: Michael Shields: "Re: Jiffies Wraparound (was Re: interrupt counts)"
Next in thread: Theodore Y. Ts'o: "Re: Linux-Sparc ext2 compatibility?"
Reply: Theodore Y. Ts'o: "Re: Linux-Sparc ext2 compatibility?"

On Wed, 21 Aug 1996, Eric Schenk wrote:

schenk>>Does anyone know how to resolve these problems?
schenk>
schenk>Not yet, I still haven't got enough information to figure it out,
schenk>and I can't reproduce it yet. If you can come up with a formula
schenk>for me to reproduce this, then maybe I can track it down a little faster.
schenk>Also, if I can get to the point where I can make a guess at what is happening
schenk>I might be able to give you some code to instrument the kernel and
schenk>try help track down the problem from your end.
The problem is that these things come up sporadically. Its been a couple
of months now. The issue seems to be timing dependant.

But a socket should never be actually STUCK in CLOSE. There should be a
timeout right?

schenk>>2. TCP sessions stall on busy machines.
schenk>
schenk>I have no outstanding reports of this problem that cannot be
schenk>attributed to MTU mismatches on the endpoints of a point-to-point
schenk>link. However, I may have easily missed a report as I've been quite
schenk>busy with real work recently. Please forward me any detail you have
schenk>about this. tcpdump's of actual stalls are particularly useful.
schenk>Also, when you say "stall" do you mean "freezes, never to recover",
schenk>or do you just mean "gets really really slow"?
As I have reported earlier: Telnet sessions get slower and slower until
they come to a standstill. SendQ is showing a couple of kilobytes to
be transferred. A ping usually gets the session going again. Also
starting up another telnet session to the machine showing the stalling
runs at full speed. This is across a PPP Link with a 28.8K Modem between
two (or three) machines running 2.0.12/13 with the Debian 1.1
Distribution.

schenk>
schenk>Also, slow network connections to the outside world are not news
schenk>unless you can exhibit a faster connection with different software
schenk>in the same environment. The internet at large is suffering from
schenk>increadible congestion these days. [This is not directed
schenk>at Christopher, but rather at the rest of the mailing list.
schenk>Please don't bombard me with reports that your netscape
schenk>connections are crawling unless you can substantiate that
schenk>it is due to a problem the Linux TCP code. Netscape
schenk>connections crawl on every kind of hardware/software these days.]
Only Linux is involved. I first suspected diald and switched it off to no

schenk>>3. Signal delivery is still unreliable. I sometimes get
schenk>> pppd's, menu programs stuck just polling for input. If I send
schenk>> them a HUP signal they gladly go away.
schenk>
schenk>What does this have to do with networking? (Assuming that
schenk>signal deliver really is the problem.) If signal delivery isn't
schenk>the problem, then what is?
It also leads to flaky network behaviour because pppd's sometimes just
start looping. They are not part of the kernel network code true.

I tried to strace the pppd's but I did not get any output. How can I
further observe what is going on?

schenk>[If you are only seeing the problem with pppd, then what version
schenk>are you using? Previous to 2.2.0f there where some problems that
schenk>could have caused it to miss a hangup on the modem line.
schenk>As far as I know this is fixed in 2.2.0f and the 2.0.x kernels.
schenk>In any case, the issue there was not a signal problem, but a
schenk>problem with select(). pppd hangs up when select() returns
schenk>an error code.]
I am running 2.2.0f

Could you tell me how to gain more information about the situation if it
happens again? What can I look at except at the "netstat -t"?

{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}
{} Snail Mail: FTS Box 466, 135 N.Oakland Ave, Pasadena, CA 91182 {}
{} FISH Internet System Administrator at Fuller Theological Seminary {}
{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}
PGP Public Key = FB 9B 31 21 04 1E 3A 33 C7 62 2F C0 CD 81 CA B5

Next message: Tom May: "Re: CLONE_FILES problem."
Previous message: Alan Cox: "Re: Linux-Sparc ext2 compatibility?"
In reply to: Michael Shields: "Re: Jiffies Wraparound (was Re: interrupt counts)"
Next in thread: Theodore Y. Ts'o: "Re: Linux-Sparc ext2 compatibility?"
Reply: Theodore Y. Ts'o: "Re: Linux-Sparc ext2 compatibility?"