2.2.5(?) - 2.2.7 Network BUG [Was: Re: 2.2.7 networking anomaly]

Simon Kirby (sim@netnation.com)
Tue, 4 May 1999 19:36:41 -0700 (PDT)


On Tue, 4 May 1999, Chris Evans wrote:
> On Tue, 4 May 1999, Simon Kirby wrote:
> > On Mon, 3 May 1999, Chris Evans wrote:
> >
> > > I've just spotted this in netstat's output
> > >
> > > tcp 0 0 ferret.lmh.ox.ac.u:auth 195.226.66.83:4174
> > > SYN_RECV
> > > on1 (42949556.40/6/0)
> > >
> > > That timer value seems a little large, no? :-)
> >
> > Seen this on 2.2.5ac1 also:
>
> Sounds like a real network bug then.
>
> 24 hours later the sockets have not gone away. The behaviour of the timer
> values and retry count is very erratic too. More details on request

I have a strange suspicion this is related to what is happening on our
nameserver machine -- about 5-6 times per day now the "named" process is
locking up with WCHAN in "tcp_close". Obviously, named then does not
answer any more requests at all. I just recently caught the WCHAN,
because previously I have been attempting to "strace" the process, but it
seems as soon as the process receives a signal it continues on as normal.

I'm not sure if this really is related or not, but if it's a
wrapping/corrupted timer somewhere, it could well be.

I will run a "netstat -a -n -o" next time I catch it happening to see if
there is actually a wrapping timer.

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/