Re: crashes with no log messages

Klaus Lichtenwalder (Klaus.Lichtenwalder@webforum.de)
Sat, 28 Mar 1998 12:25:29 +0000 (WET)


On Fri, 27 Mar 1998, Gustin Kiffney wrote:

> I have seen a number of messages that go like this: 'Last night my
> 2.0.33 machine, Pentium II, etc etc crashed with no log messages...
> what's wrong?' But none of these posters ever mention if the machine
> was on an uninterruptible power supply and running powerd or something
> else that would make for an orderly shutdown. Really, you should

FWIW, I said that that machine actually was on an UPS (didn't I). It's
on an UPS, there are a few more machines running thru the night. It's
two stores down the earth. It's in a root with aircondition. That's not
what I'd call harsh environment.

> mention this, because as Alan Cox pointed out Linux has a fundamental
> bug in that if the computer loses its connection to a working power
> supply it will crash with no logs. More than that, if there's a
> brownout, or an EMF pulse, this can happen too. If you DON'T have your
> system on a UPS, and your system crashes this way there's really no
> point in posting.

Depends. There are a lot of those machines that did crash for the only
reason of upgrading to 2.0.33. Ok, if somebody says his machine had an
uptime of, say, more than 20 days and without change in configuration
starts crashing, your point is valid.

> Perhaps it's my experience with making and troubleshooting electronics
> hardware that makes me amused at people's faith in the mechanical system
> itself. It's amazing what a little extra heat, dust, etc on a
> motherboard or add-in card will do, esp. if that card was designed
> without much 'overhead' for external stress. Electronics can suffer
> from cosmic rays of all things - flip a bit in a DRAM just like that.
> For even more fun, you can get a bus loaded with peripherals in a state
> with a certain combination of bit patterns that turn a bit into
> something that's not a 0 and not a 1 - the chances might be 1 in a
> quintillion or something but when you're talking a 300 Mhz machine,
> well you can calculate about how often it'll happen and it's really not
> that unlikely. Really, you should be amazed when things DON'T crash
> with no error logs ....

Well, that's all fine. Sure, PC HW is much more crap than anything else,
IBM boxes, sun boxes, whatever. Still, if you chose to buy selected hw,
the chances for having machines that run for a year without crash (even
without ups, maybe the 220 (240 actually) volts in Germany (Europe) are
better for undisturbed power supply than are 110.

>
> About the only kind of hardware that I see that
> doesn't crash like that more than, say, once in 5 years, is a machine
> specifically designed by a capable system designer to be a server with
> great ventilation, lots of fans, well made SCSI cables, redundant power
> supplies, on a good UPS with monitoring, running Netware, OS/2, UNIX or
> AS/400's system. If you have a low-cost desktop machine and get these
> kinds of numbers, you're just lucky. Don't blame Linux, for goodness's
> sake. If you see a crash with no logs running Linux on equipment rated
> for life support systems, by all means, do post.

Still, you can get a server with redundant power supply, good scsi
cables, expensive, well tested hw and still use pc architecture. I don't
see the point. You don't know whether people are using cheap desktop
things or "hardened" equipment. These machines still crashed for the
only reason of being upgraded to a certain release level. And, the
probability of having a program crash and seeing this in the logs for
reason of bits flipping will certainly be higher than just total
lockups.

Just my EU 0.02...

Klaus
------------------------------------------------------------------------
Klaus Lichtenwalder, Dipl. Inform., PGP Key: email to key@Four11.com
Lichtenwalder@ACM.org http://www.wp.com/Klaus
K.Lichtenwalder@Computer.org fax: +49-89-91072699
Mausoberflaechen sind meistens pelzig -- Ricarda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu