Re: 2.6.15-mm1 - locks solid when starting KDE (EDAC errors)

From: Andrew Morton
Date: Fri Jan 06 2006 - 22:12:25 EST


Jesper Juhl <jesper.juhl@xxxxxxxxx> wrote:
>
> On 1/7/06, Andrew Morton <akpm@xxxxxxxx> wrote:
> > Dave Jones <davej@xxxxxxxxxx> wrote:
> > >
> > > On Sat, Jan 07, 2006 at 01:25:22AM +0100, Jesper Juhl wrote:
> > > > On 1/6/06, Andrew Morton <akpm@xxxxxxxx> wrote:
> > > > > Jesper Juhl <jesper.juhl@xxxxxxxxx> wrote:
> > > > >
> > > > > > Reverted that one patch, then rebuild/reinstalled the kernel
> > > > > > (with the same .config) and booted it - no change. It still locks up
> > > > > > in the exact same spot.
> > > > > > X starts & runs fine (sort of) since I can play around at the kdm
> > > > > > login screen all I want, it's only once I actually login and KDE
> > > > > > proper starts that it locks up.
> > > > >
> > > > > Oh bugger. No serial console/netconsole or such?
> > > > >
> > > > > Or are you able log in and then quickly do the alt-ctrl-F1 thing, see if we
> > > > > get an oops?
> > > > >
> > > > I switched to tty1 right after logging in, and after a few seconds
> > > > (corresponding pretty well with the time it takes to hit the same spot
> > > > where it crashed all previous times) I got a lot of nice crash info
> > > > scrolling by.
> > > > Actually a *lot* scrolled by, a rough guestimate says some 4-6 (maybe
> > > > more) screens scrolled by, and since the box locks up solid I couldn't
> > > > scroll up to get at the initial parts :( So all I have for you is the
> > > > final block - hand copied from the screen using pen and paper
> > > > ...
> > > > It never makes it to the logs, and as mentioned previously I don't
> > > > have another machine to capture on via netconsole or serial, so if you
> > > > have any good ideas as to how to capture it all, then I'm all ears.
> > >
> > > If only someone did a patch to pause the text output after the first oops..
> > >
> > > Oh wait! Someone did!
> > >
> >
> > umm, it'd be more helpful if you'd actually sent the patch so Jesper could
> > apply it so we can find this bug.
> >
>
> Ok, this is with a pristine 2.6.15-mm1 + Dave's oops-pausing-patch
> Captured by switching to tty1 just after logging in via kdm.
> A *lot* of info still scrolls down when the problem hits before Daves
> patch stops it at a BUG dump, it scrolls by too fast for me to see
> what it is, but I guess it must be warning/error messages other than
> Oops's or BUG()'s ???

There might be something wrong with Dave's patch. Or yes, this might be
the first oops.

> Anyway, here's the entire contents of my screen after Daves patch
> stops the output - again written down by hand and then typed in from
> my handwritten notes, so there may be typos, but I've tried to be
> accurate.
>

Gad. That's a lot of work, thanks. Serial console is easier...

Is your screen set to 50 rows? That helps. As does a digital camera.

>
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ------------{ cut here ]------------
> kernel BUG at include/linux/list.h:166!
> invalid opcode: 0000 [#1]
> PREEMPT SMP
> Last sysfs file: /class/vc/vcsa2/dev
> Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
> snd_mixer_oss uhci_hcd usbcore snd_emu10k1 snd_rawmidi snd_ac97_codec
> snd_ac97_bus snd_pcm snd_seq_device snd_timer snd_page_alloc
> snd_util_mem snd_hwdep snd agpgart
> CPU: 0
> EIP: 0060:[<c01475f5>] Tainted: G B VLI
> EFLAGS: 00010017 (2.6.15-mm1)
> EIP is at __rmqueue+0xe5/0x100
> eax: c0374dd4 ebx: c16ea018 ecx: 00000007 edx: c16ef018
> esi: c0374e9c edi: c0374d80 ebp: f7ad0dac esp: f7ad0d90
> ds: 007b es: 007b ss: 0068
> Process syslogd (pid: 1038, threadinfo=f7ad0000 task=c21dfa90)
> Stack: <0>00000001 c16ef000 c0374e9c 00000000 00000000 c0374dcc
> 0000001f f7ad0dcc
> <0>c0147642 c0374e40 00000000 c0374d80 c0374d80 c0374d80
> c0374dc0 f7ad0e00
> <0>c0147b1a c0374dcc c0179682 c21ce4d0 f7af1090 00000000
> f7ad0f34 00000256
> Call Trace:
> [<c0103e4a>] show_stack+0x8a/0xa0
> [<c0104060>] show_registers+0x1e0/0x250
> [<c0104272>] die+0x112/0x1a0
> [<c010437f>] do_trap+0x7f/0xc0
> [<c0104693>] do_invalid_op+0xa3/0xb0
> [<c0103b0b>] error_code+0x4f/0x54
> [<c0147642>] rmqueue_bulk+0x32/0x60
> [<c0147b1a>] buffered_rmqueue+0x12a/0x270
> [<c0147db3>] get_page_from_freelist+0xa3/0xd0
> [<c0147e2e>] __alloc_pages+0x4e/0x320
> [<c015e00b>] kmem_getpages+0x3b/0xa0
> [<c015eee2>] cache_grow+0xb2/0x170
> [<c015f1a8>] cache_alloc_refill+0x208/0x250
> [<c015f426>] kmem_cache_alloc+0x66/0x70
> [<c011dc6b>] dup_task_struct+0x3b/0xf0
> [<c011ea22>] copy_process+0x62/0xe70
> [<c011f922>] do_fork+0x62/0x1a0
> [<c0101b5f>] sys_clone+0x2f/0x40
> [<c0103009>] syscall_call+0x7/0xb
> Code: 13 89 5a 04 ff 43 08 89 70 0c 0f ba 28 0b 3b 75 f0 7f d3 8b 45
> e8 83 c4 10 5b 5e 5f c9 c3 0f 0b a5 00 91 bd 32 c0 e9 7a ff ff ff <0f>
> 0b a6 00 91 bd 32 c0 e9 74 ff ff ff 8d b4 26 00 00 00 00 8d

Ugh. Corrupted page allocator free lists. Could be anything.

Can you send the .config?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/