Re: Ok, I give up...

Brian Macy (bmacy@sunshinecomputing.com)
Mon, 20 Dec 1999 09:00:46 -0800


Just an aside... if you notice this happening when X is running, are you
running KDE? Do a 'ps waux | grep kscd' and see if anything pops up. On
my SCSI CD drives kscd goes postal and causes a sr0 disk change
notification a couple of times as second... really annoying and I can't
imagine it does the hardware or driver any good.

Brian Macy

Wakko Warner wrote:
>
> I'm going to throw in another. It might be domain validations that fail. I
> got one that didn't succeed. Took down my entire system (It is SMP with an
> aha-2940u). Seems like every time I insert a CD in the drive and try to use
> it another domain validation occures (2.2.13) I don't believe this ever
> happened with 2.2.7
>
> > Just some near random thoughts that might help you here...
> >
> > Are you logging the syslog on remote machines (@hostname entries in the
> > syslogd.conf)? I've found in the past that that helps against the loss of
> > log file tails.
> >
> > Are the boxes SMP, and are the disks maybe in (IDE) PIO mode sometimes? If
> > that is possible, then you should try the 14pre15, because Alan has
> > recently fixed a lockup bug for that case.
> >
> > Did you use kernels with minimal hardware support compiled-in (Just
> > the minimum what you need, from the hardware that is present, and without
> > using kernel modules and kerneld)?
> >
> > Did you try shuffling around with the hardware, different network cards,
> > different mainboard/CPU combinations, other/new DIMMs, different power
> > supplies (hooked up to different power groups, different/newer UPS-es)),
> > to see when the problem gets worse or less (might help find the problem).
> > Even though it used to work perfectly, the PC hardware could have become
> > flakey since you ran the 2.0.x kernels (CPU fans are not the only things
> > that can break down), and the power net supply and/or UPS (110V/220V) can
> > have become worse, which can also have an influence on system stability.
> >
> > Also, are you running X11 on the servers (if so, I suggest you stop doing
> > that, the XFree86 binaries have direct access to all hardware, so a bug in
> > that binary can wreak havoc, and in addition VGA cards can lockup the PC
> > hardware in some cases)? Id even suggest running the PCs without video
> > cards (dont forget to compile a new kernel without console support, or you
> > will be left with a significant slowdown of your server).
>
> > > About 2 months ago, I moved all of my servers ( 15 of them) to the 2.2.x
> > > kernels. Some were clean installs of RH 6.0, some were upgrades to RH
> > > 5.2. But all of these servers ran 2.0.35 and 2.0.36 with 100+ days of
> > > uptime and were rock solid. But since moving to the 2.2 kernels on the
> > > same hardware, reliability and uptime sucks. Seems like I can rarely get a
> > > month of uptime with the 2.2 kernels, and I've tried everything from 2.2.5
> > > to 2.2.14pre13. The few oopses I've had have been traced back to buggy
> > > hardware that has since been replaced. But in most every case with the 2.2
> > > kernels, the servers (mainly serving web pages) run for a few days to a
> > > week and then lock up completely. Then it requires a power cycle to bring
> > > it back to life.
> > >
> > > So I'm stumped. When a system locks up I don't get any console messages or
> > > anything in syslog. Since all of these servers ran 2.0.x kernels and were
> > > very stable for months, I'm inclined to think the hardware is fine. So any
> > > clues as to how I might track this down? I'd hate to say it (flame jacket
> > > on) but my NT and freeBSD boxes are putting Linux to shame in terms of
> > > stability....
> > >
> > > David
> > >
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.rutgers.edu
> > Please read the FAQ at http://www.tux.org/lkml/
> --
> Lab tests show that use of micro$oft causes cancer in lab animals
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/