Re: 2.6.7-rc1-bk: SMT scheduler bug / crashes on kernel boot

From: Nick Piggin
Date: Wed May 26 2004 - 07:17:07 EST


Anton Altaparmakov wrote:
On Wed, 2004-05-26 at 12:13, Nick Piggin wrote:

Anton Altaparmakov wrote:

Hi,

Kernel 2.6.7-rc1-bk crashes on boot with a NULL pointer dereference. The kernel is running under VMware if that matters but I don't think it
should. It was working fine with 2.6.6-rc3-bk kernels.

I am afraid the only way I could capture the crash was to capture the
vmware screen into a PNG image which is attached. Maybe I need to setup
some OCR software for in the future... (-;

The system running VMware is a P4 2.6Hz with Hyper threading enabled and
/proc/cpuinfo shows two cpus:

OK, thanks for that. It would be quite helpful if you edit
kernel/sched.c and turn the line #undef SCHED_DOMAIN_DEBUG into
#define SCHED_DOMAIN_DEBUG, then compile a kernel with debugging
info enabled.


Looking at kernel/sched.c it already says #define, not #undef!


Oops, yes.

[snip]

So the dereferencing of one of the two fails. Considering the offset is
0x18 in the NULL dereference it must be the (p)->prio that causes the
oops and hence p must be NULL. I will leave you to figure out what that
means...


Nice detective work.

It tried to dereference a NULL idle thread I'd say.
ie. the CPU hasn't been set up. Please try Ingo's patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/