Re: 2.6.25-mm1: not looking good

From: Vegard Nossum
Date: Fri Apr 18 2008 - 10:47:31 EST


On 4/18/08, Jason Wessel <jason.wessel@xxxxxxxxxxxxx> wrote:
> Vegard Nossum wrote:
> > On Fri, Apr 18, 2008 at 3:02 PM, Jason Wessel
> > <jason.wessel@xxxxxxxxxxxxx> wrote:
> >
> >> Vegard Nossum wrote:
> >> > On Fri, Apr 18, 2008 at 2:34 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
> >> >
> >> >> * Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> >> >>
> >> >> > With the patch below, it seems 100% reproducible to me (7 out of 7
> >> >> > bootups hung).
> >> >> >
> >> >> > The number of loops it could do before hanging were, in order:
> 697,
> >> >> > 898, 237, 55, 45, 92, 59
> >> >>
> >> >> cool! Jason: i think that particular self-test should be repeated
> 1000
> >> >> times before reporting success ;-)
> >> >>
> >> >
> >> > BTW, I just tested a 32-bit config and it hung after 55 iterations as
> well.
> >> >
> >> > Vegard
> >> >
> >> >
> >> >
> >> I assume this was SMP?
> >>
> >
> > Yes. But now that I realize this, I tried running same kernel with
> > qemu, using -smp 16, and it seems to be stuck here:
> >
> >
>
> Unless you have a qemu with the NMI patches, kgdb does not work on SMP
> with qemu. The very first test is going to fail because the IPI sent by
> the kernel is not handled in qemu's hardware emulation.

Oops, no, and that makes sense.

I now picked up qemu 0.9.1 and applied the three NMI/SMI patches by Jan Kiszka.

So in qemu it seems to run fine now, except that I need to prod it
sometimes (it gets stuck in cpu_clock() and I have to break/continue
from gdb to make it proceed). Oh, there it made it to 1056, and gdb
can't interrupt anymore. Hmm. This is probably not a very good
testing/debugging environment if the qemu support is that bad. Sorry
:-)

But booting with nosmp on real hardware gets easily above 100,000
iterations of the loop (before I reboot), so it seems to be related to
that, anyway.

Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/