Re: BUG: soft lockup detected on Phenom with Debian 2.6.24-4

From: Peter Oruba
Date: Thu Mar 20 2008 - 09:45:29 EST


Laurent,

you may have triggered the L2 eviction bug (E298).

Please try

hexdump -s 0xc0010015 -n 8 -C /dev/cpu/0/msr

Output is little-endian, so the left-most byte must have bit 3 enabled meaning TLB caching is disabled.

-Peter

Laurent GUERBY schrieb:
On Sun, 2008-03-16 at 15:43 +0100, Laurent GUERBY wrote:
On Sat, 2008-03-08 at 23:00 +0100, Laurent GUERBY wrote:
Hi,

I have a system with an "AMD64 Phenom 9500" quad core cpu, 4GB RAM,
"ASUS M3A32 MVP Deluxe wifi" motherboard with latest vendor BIOS
(0801).

I tried stock debian etch kernel (Debian 2.6.18.dfsg.1-18etch1),
machine
froze with no message, debian etch backport kernel same, and then
Debian 2.6.24-4 from unstable and I got some messages: machine
is not frozen but some userland processes are (ps says "Dl" state
with child in "Zs" state) and "events/3" is taking 100% cpu
according to top:

18 root 15 -5 0 0 0 R 100 0.0 74:59.46
events/3

Got to the same state with ubuntu hardy 2.6.24-8-server kernel. All
kernels are untainted, no X running anyway.

It takes a few hours of doing some stuff, in my case bootstraping or
testing GCC at -j 4, and then the problem happens.
On 2.6.24-1-amd64 (Debian 2.6.24-4) I got a slightly different
backtrace in /var/log/messages after a few hours of stressing the
machine with compilations, see below. The given process
was stuck and unkillable.

Any idea on what to do/try?

I changed motherboard and went for a (way cheaper but older) ASUS M2A-VM
with is based on the AMD 690G chipset with the exact same
kernel/phenom/memory/disk/box and installed the latest vendor BIOS
(1604). It took longer (25 hours) to get a stuck and unkillable process
but it did happen, but I got nothing in /var/log/kern.log this time.

In order to rule out a motherboard or memory issue I bought an Athlon X2
4400+ EE and put in replacement of the phenom with the exact same
kenel/memory/disk/box and my stress test has been running for 72 hours
without any issue so far.

So in the end it seems to be a problem specific to phenom 9500 with the
linux kernel.

Did anyone succeed in getting a stable linux box based on a phenom 9500
processor ? "Stable" defined as being able to survive a few days
compiling at -j4 (this is for the GCC compile farm after all :).

If so I'm interested by the exact motherboard/bios version/kernel
version/distro used.

As proposed in my first email, ssh root access is possible to my machine
(with either motherboard).

Thanks in advance,

Laurent


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/