1.3.71 killed by TCP bogons again!

Oliver Xymoron (oxymoron@waste.org)
Sun, 10 Mar 1996 10:48:13 -0600 (CST)


On Wed, 6 Mar 1996, I wrote:

> I remotely installed 1.3.71 on my home machine at about lunch time today.
> Previously I had .57 running with an uptime of about 40 days and
> apparently a couple meg in the send queues of closed TCP connections.
> The .71 kernel ran for several hours with moderate web, ftp, and mail
> activity, and a number of users logged on, when it decided to die.
> Mgettys on the dialins stopped answering, telnet sessions would connect,
> but never respond, etc. The console was filled with a message I didn't
> get to see, allegedly referring to "TCP bogons". Probably from tcp_output.c:
>
> printk("tcp_send_skb: attempt to queue a bogon.\n");
>
> No other signs of difficulty appeared in my log files..
>
> This kernel was built with the following networking options:
> ...
> # Networking options
> #
> CONFIG_FIREWALL=y
> CONFIG_NET_ALIAS=y
> CONFIG_INET=y
> CONFIG_IP_FORWARD=y
> # CONFIG_IP_MULTICAST is not set
> CONFIG_IP_FIREWALL=y
> CONFIG_IP_ACCT=y
> # CONFIG_IP_ROUTER is not set
> # CONFIG_NET_IPIP is not set
> # CONFIG_IP_FIREWALL_VERBOSE is not set
> CONFIG_IP_MASQUERADE=y
> CONFIG_IP_ALIAS=y
>
> #
> # (it is safe to leave these untouched)
> #
> # CONFIG_INET_PCTCP is not set
> CONFIG_INET_RARP=y
> # CONFIG_NO_PATH_MTU_DISCOVERY is not set
> # CONFIG_TCP_NAGLE_OFF is not set
> CONFIG_IP_NOSR=y
> CONFIG_SKB_LARGE=y
> ...
>
> Any ideas? I've moved back to 1.3.58 for the meantime (my .57 kernel got
> overwritten about a month ago)...

Apparently this isn't a fluke. My system died spewing "attempted to queue a
bogon" messages after another 12 hours or so of uptime with .71, requiring me
to come home over lunchtime to resuscitate it. I've since recompiled the
kernel with a software watchdog (hacked to have a ten minute timer) and
added a little watchdogd to my system - not an optimal solution, but will
keep me from coming home from work the next time it happens. Anyone know
what could be causing this or how to go about diagnosing a problem that
kills the machine and only seems to appear after many hours of uptime?

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."