Re: A couple of oops.

From: Carlos Martín
Date: Wed May 24 2006 - 10:31:50 EST


On Tue, 2006-05-23 at 17:40 -0700, Andrew Morton wrote:
> Carlos MartÃn <carlos@xxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > I've nailed this down to something that happened in 2.6.17-rc4. The
> > system locks up with either a NULL dereference or an unhandable paging
> > request. The stack trace shows this:
> >
> > paging request NULL dereference
> >
> > _raw_spin_trylock+12 _raw_spin_trylock+20
> > __spin_lock+22
> > main_timer_handler+22
> > timer_interrupt+18
> > handle_IRQ_event+41
> > __do_IRQ+156
> > do_IRQ+51
> > default_idle+0
> > _spin_unlock_irq+43
> > thread_return+187
> > generic_unplug_device+0
> > default_idle+45
> > dev_idle+95 (I can't read the func clearly in this handwriting)
> > start_secondary+1129
> >
> > I'm guessing this is the same problem only that it once manifests itself
> > as one and another time as the other. The problem is in the call to
> > write_seqlock(&xtime_lock) from main_timer_handler().
> >
> > I've not been able to determine what patch has caused this to happen,
> > but it is between 2.6.17-rc3 and -rc4. I'm bisecting, but if anybody has
> > a good candidate, it'd probably be faster than doing a complete bisect.
>
> hm, so an attempt to access xtime_lock.lock results in a null-pointer deref?

It seems to be xtime_lock.sequence:
main_timer_handler:
pushq %r13 #
movq %rdi, %r13 # regs, regs
movq $xtime_lock+8, %rdi #,
pushq %r12 #
pushq %rbp #
pushq %rbx #
pushq %rbp #
call _spin_lock #
OOPS--> incl xtime_lock(%rip) # xtime_lock.sequence
xorl %r12d, %r12d # offset

>
> x86_64 does novel things, putting xtime_lock into a linker section all of
> its own. At a guess I'd say that your compiler/assembler/linker toolchain
> got confused and generated the incorrect address for xtime_lock. But if
> that was the case you'd get oopses 100% of the time - it wouldn't be
> intermittent, as your description seems to imply (although it's quite
> unclear?).

Once I've seen the NULL dereference, the rest are paging request errors.
Most of the time I'm on X, so I don't actually see output, but the ones
I've seen have been like that.

It starts somewhere between rc3 and rc4.

With the bisect, now I'm left with a bunch of SCSI patches which don't
seem to bear any relationship and which I don't use (except for
usb-storage).

>
> You could do:
>
> gdb vmlinux
> (gdb) p &xtime_lock
> (gdb) x/40i main_timer_handler
>
--
Carlos MartÃn Nieto | http://www.cmartin.tk
Hobbyist programmer |

Attachment: signature.asc
Description: This is a digitally signed message part