Re: [Bug 11388] New: 2.6.27-rc3 warns about MTRR range; only 3 of16gb of memory is usable

From: Joshua Hoblitt
Date: Tue Aug 26 2008 - 04:35:47 EST


On Sat, Aug 23, 2008 at 12:43:11PM +0200, Ingo Molnar wrote:
>
> * Yinghai Lu <yhlu.kernel@xxxxxxxxx> wrote:
>
> > On Fri, Aug 22, 2008 at 5:22 PM, Joshua Hoblitt <j_kernel@xxxxxxxxxxx> wrote:
> > > I've confirmed that the boards in these systems are Tyan Tempest
> > > i5400PW (S5397)s. We've discovered a workload that will deadlock
> > > the system under both 2.6.24.2 and -tip kernel with the mtrr masking
> > > patch. The only thing unusual about this workload is that one of
> > > the binaries in it constantly segvs... Is it possible that these
> > > deadlocks (no kernel oops on console) are caused by MSR setup
> > > wierdness or is it likely unrelated?
> >
> > could be other problem.
> >
> > cpu should be smarter enough to understand the missing bits in mask.
> > at least amd cpu. remember that we didn't set mask bits to 40bits with
> > opteron with LinuxBIOS, and everything still works well.
>
> yeah. Is the deadlock debuggable? (does nmi_watchdog=1 produce anything
> useful, or does the enabling of CONFIG_PROVE_LOCKING=y show anything
> weird in the syslog during light, non-deadlocking use of this workload?)

Enabling the nmi_watchdog doesn't produce anything at all (I double
checked the .config... it should be working). Rebuilding with
PROVE_LOCKING seems to have prevented the deadlock. It used to take
30-45 mins to lock the system up under heavy load and we're going on 6
hours here with no issues. Absolutely nothing in the dmesg. Ugh. Any
other suggestions? How bad is it to leave PROVE_LOCKING enabled?

-J

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/