Re: A couple of oops.

From: Andrew Morton
Date: Tue May 23 2006 - 20:40:34 EST


Carlos Martín <carlos@xxxxxxxxxx> wrote:
>
> Hi,
>
> I've nailed this down to something that happened in 2.6.17-rc4. The
> system locks up with either a NULL dereference or an unhandable paging
> request. The stack trace shows this:
>
> paging request NULL dereference
>
> _raw_spin_trylock+12 _raw_spin_trylock+20
> __spin_lock+22
> main_timer_handler+22
> timer_interrupt+18
> handle_IRQ_event+41
> __do_IRQ+156
> do_IRQ+51
> default_idle+0
> _spin_unlock_irq+43
> thread_return+187
> generic_unplug_device+0
> default_idle+45
> dev_idle+95 (I can't read the func clearly in this handwriting)
> start_secondary+1129
>
> I'm guessing this is the same problem only that it once manifests itself
> as one and another time as the other. The problem is in the call to
> write_seqlock(&xtime_lock) from main_timer_handler().
>
> I've not been able to determine what patch has caused this to happen,
> but it is between 2.6.17-rc3 and -rc4. I'm bisecting, but if anybody has
> a good candidate, it'd probably be faster than doing a complete bisect.

hm, so an attempt to access xtime_lock.lock results in a null-pointer deref?

x86_64 does novel things, putting xtime_lock into a linker section all of
its own. At a guess I'd say that your compiler/assembler/linker toolchain
got confused and generated the incorrect address for xtime_lock. But if
that was the case you'd get oopses 100% of the time - it wouldn't be
intermittent, as your description seems to imply (although it's quite
unclear?).

You could do:

gdb vmlinux
(gdb) p &xtime_lock
(gdb) x/40i main_timer_handler

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/