Re: The stability crisis

Jes Sorensen (Jes.Sorensen@cern.ch)
02 Jul 1999 13:52:11 +0200


>>>>> "Brian" == brian <brian@worldcontrol.com> writes:

Brian> I'm not the person Linus was addressing, but I've had plenty of
Brian> oopses with 2.2.1 - 2.2.10 and have not sent any in.

No info, no debugging.

Brian> So far as I know there are only two ways to capture the data
Brian> related to an oops. Write it down with a pencil, or capture it
Brian> via a serial port on another machine. The first seems too
Brian> prone to errors, and the second just isn't realistic for me and
Brian> my cluster of machines. Too many serial cables going every
Brian> which way. Or maybe I'm just lazy.

Well we are hooking up our Linux PCs to our serial console system for
this very reason - it just has to be done. It also saves one from all
those silly console switches ;-) (though I think they will be kept
there).

Brian> I have a setup which oopses in 5 minutes to a few days when
Brian> compiled with SMP support. The identical source compiled
Brian> without SMP runs forever as far as I can tell.

Fine, take one box, hook up a serial line and wait. If it is that easy
to get a crash then you don't need to wire them all up.

Brian> Since all things have previously be discussed on this list, I'm
Brian> going to let my linux "newbieism" show by asking for a feature
Brian> which has undoubtably been asked for before and has undoubtable
Brian> been shot down for very legitimate reasons.

I haven't seen it before, but there are some good reasons for it.

Brian> I would like my oops'ing systems to send the oops to another
Brian> system via an ethernet interface. How about a UDP packet?
Brian> Nice connectionless protocol. Compile the MAC/IP address into
Brian> the kernel. Opps occurs, build the UDP packet with the measly
Brian> 2K oops message in it and send.

Before you can do this you need a hardware address for the receiver,
this can of course be done by putting it statically into this special
driver. Alternatively you need a fair amount of a network stack to run
before you can first get the hw address via ARP, and then send out a
message with the oops info. Now this requires a sorta functional
Ethernet driver to get anywhere and often your interrupts are dead.

So in principle one could hack something like this, but it will be
nasty, it will require a special Ethernet driver going in,
initializing the card from scratch (it could be the interface that
crashed) and then send out the packet.

To be quite frank, it is less of a hassle to hook up a serial cable,
I cannot see any reason why people keep refusing to do this.

Jes

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/