Re: Ok, I give up...

Wakko Warner (wakko@animx.eu.org)
Mon, 20 Dec 1999 12:18:29 -0500


> Just an aside... if you notice this happening when X is running, are you
> running KDE? Do a 'ps waux | grep kscd' and see if anything pops up. On
> my SCSI CD drives kscd goes postal and causes a sr0 disk change
> notification a couple of times as second... really annoying and I can't
> imagine it does the hardware or driver any good.

Yes, I was in X (I have a terminal watching my logs), no I don't use kde. I
was inserting cd's in my changer, and noticed for each cd I mounted, it went
through the validation again (for the same SCSI ID !). It did the same
thing on my cd recorder and after cdrecord -load, it caused the validation
and the bus froze. HDD light never went off (I don't have IDE drives). Box
is SMP (dual pII 450)

> > I'm going to throw in another. It might be domain validations that fail. I
> > got one that didn't succeed. Took down my entire system (It is SMP with an
> > aha-2940u). Seems like every time I insert a CD in the drive and try to use
> > it another domain validation occures (2.2.13) I don't believe this ever
> > happened with 2.2.7
> >
> > > Just some near random thoughts that might help you here...
> > >
> > > Are you logging the syslog on remote machines (@hostname entries in the
> > > syslogd.conf)? I've found in the past that that helps against the loss of
> > > log file tails.
> > >
> > > Are the boxes SMP, and are the disks maybe in (IDE) PIO mode sometimes? If
> > > that is possible, then you should try the 14pre15, because Alan has
> > > recently fixed a lockup bug for that case.
> > >
> > > Did you use kernels with minimal hardware support compiled-in (Just
> > > the minimum what you need, from the hardware that is present, and without
> > > using kernel modules and kerneld)?
> > >
> > > Did you try shuffling around with the hardware, different network cards,
> > > different mainboard/CPU combinations, other/new DIMMs, different power
> > > supplies (hooked up to different power groups, different/newer UPS-es)),
> > > to see when the problem gets worse or less (might help find the problem).
> > > Even though it used to work perfectly, the PC hardware could have become
> > > flakey since you ran the 2.0.x kernels (CPU fans are not the only things
> > > that can break down), and the power net supply and/or UPS (110V/220V) can
> > > have become worse, which can also have an influence on system stability.
> > >
> > > Also, are you running X11 on the servers (if so, I suggest you stop doing
> > > that, the XFree86 binaries have direct access to all hardware, so a bug in
> > > that binary can wreak havoc, and in addition VGA cards can lockup the PC
> > > hardware in some cases)? Id even suggest running the PCs without video
> > > cards (dont forget to compile a new kernel without console support, or you
> > > will be left with a significant slowdown of your server).
> >
> > > > About 2 months ago, I moved all of my servers ( 15 of them) to the 2.2.x
> > > > kernels. Some were clean installs of RH 6.0, some were upgrades to RH
> > > > 5.2. But all of these servers ran 2.0.35 and 2.0.36 with 100+ days of
> > > > uptime and were rock solid. But since moving to the 2.2 kernels on the
> > > > same hardware, reliability and uptime sucks. Seems like I can rarely get a
> > > > month of uptime with the 2.2 kernels, and I've tried everything from 2.2.5
> > > > to 2.2.14pre13. The few oopses I've had have been traced back to buggy
> > > > hardware that has since been replaced. But in most every case with the 2.2
> > > > kernels, the servers (mainly serving web pages) run for a few days to a
> > > > week and then lock up completely. Then it requires a power cycle to bring
> > > > it back to life.
> > > >
> > > > So I'm stumped. When a system locks up I don't get any console messages or
> > > > anything in syslog. Since all of these servers ran 2.0.x kernels and were
> > > > very stable for months, I'm inclined to think the hardware is fine. So any
> > > > clues as to how I might track this down? I'd hate to say it (flame jacket
> > > > on) but my NT and freeBSD boxes are putting Linux to shame in terms of
> > > > stability....
> > > >
> > > > David
> > > >
> > > > Please read the FAQ at http://www.tux.org/lkml/
> > > >
> > >
> > >
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.rutgers.edu
> > > Please read the FAQ at http://www.tux.org/lkml/
> > --
> > Lab tests show that use of micro$oft causes cancer in lab animals
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.rutgers.edu
> > Please read the FAQ at http://www.tux.org/lkml/

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/