Re: SMP lockup

Linus Torvalds (torvalds@transmeta.com)
3 Mar 1997 21:06:37 GMT


In article <Pine.LNX.3.91.970303094044.265A-101000@chaos.analogic.com>,
Richard B. Johnson <root@analogic.com> wrote:
>
>On my Dual Pentium 166 MHz machine running SMP, I have discovered a
>repeatable problem. It is possible to make it just STOP! No errors
>no obvious problems. After it stops, there is no response to the keyboard,
>nothing at all. The CPUs __seem__ to be halted with the interrupts off.
>This is evident by the chip temperature rapidly returning to room temperature.
>There is no bus activity except for an approximate 26 us CAS for refresh.

Heh. You have hardware I haven't got access to, but I've seen the same
thing now that I have access to a SMP box.

The good news is that I _think_ I have found the problem, and I'm
running a personal test-kernel that has so far been up for almost an
hour under a load that usually crashed it within 15 minutes before my
fix.

Tha bad news is that my current fix is so horribly ugly that I don't
dare show it to anybody but Alan and David (and David hurled chunks when
he saw it ;)

Anyway, I hope to have this fixed in 2.1.28, with the really ugly fix
unless I can make it cleaner. This has been my #1 priority for the last
few days, and I finally feel like I have a good angle on it. Knock
wood.

Linus