Re: The stability crisis

Steve Underwood (steveu@netpage.com.hk)
Sun, 04 Jul 1999 12:40:40 +0000


Hi,

sjw44@eng.cam.ac.uk wrote:

> Mark H. Wood wrote:
> >
> > On 2 Jul 1999, Henning P. Schmiedehausen wrote:
> > [snip]
> > > Actually I think, that sending the oops out over the network (as a
> > > compile option, of course) is a nice idea. Maybe I will toy with this
> > > sometime this weekend (don't hold your breath, though :-)
> >
> > There's actually quite a lot of experience with this sort of thing
> > already. DECnet nodes have been upline-dumping over DNA Maintenance
> > Operation Protocol for decades. (Literally. I have a copy of DDCMP
> > Specification 4.0 dated 1978 which talks about MOP, and Stuart Wecker's
> > paper on what became DDCMP is dated 1974.) Even if you are now thinking,
> > "ewww, DECnet" it is worth studying. Alas my DNA documentation is for
> > Phase III so I don't have anything on how MOP is used over Ethernet.
> >
> > We had racks of terminal servers, an InfoServer 150 and a WANrouter 250
> > that never had any trouble loading or dumping over the wire.
> >
>
> Hi,
>
> the specs for MOP are freely available, the URL is
> ftp://gatekeeper.dec.com/pub/DEC/DECnet/PhaseIV/maintop30.txt
>
> MOP seems to consist of three parts, loopback testing, remote dumping/loading of
> system and remote console. Somewhere (written for the linux-vax project)
> there are userland tools which perform some of these operations.
>
> I've been asked by someone (at a talk I was giving on Linux DECnet) about
> the possibility of having a MOP remote console client which runs on Linux
> to control a terminal server which requires this, so I'm looking into the
> possibilites of this. Also I have started to put hooks in the raw socket
> layer of the Linux DECnet code for MOP, but much work needs to be done
> before it will be useful.
>
> If anyone is interested in taking this further, plese let me know or send
> mail to the linux-decnet list at dreamtime.org. The linux-decnet project
> home page is at http://www.sucs.swan.ac.uk/~rohan/DECnet/index.html
>
> Steve.

I haven't seriously used a VAX for about 12 years (see, life isn't all bad) and my
memory may be blurred. I seem to remember the MOP facility worked with the remote
diagnostics board installed. They had their own little micro, and everything in
firmware. If the VAX processor crashed the remote diagnostic board could still
access all the memory and dump its contents somewhere. Naturally this was a robust
solution. It isn't really a solution for Linux, however. Many times a crash will
occur with the protocol stack and NIC driver intact, so there may have a good
chance of dumping from Linux itself. It isn't very clean, though.

In another message someone suggested investigating the use of the soft reset
feature in the PC BIOS from the 286 days. I'm not sure if this is still available
in modern BIOSes. Even if it is, I don't see how it would help without something
reliable in ROM you can point at and run. There isn't any totally reliable code in
RAM that you could point to.

I wondered about using the boot ROM socket in the NIC to house an incorruptible
NIC driver, UDP, and tftp dump facility - not much different to the more usual use
for a tftp boot load facility. If the 286 reset trick does work with modern
machines perhaps a combination of these two would provide a reliable dump feature.

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/