OOPS after upgrading CPU's ... (fwd)

From: Matthias Weidle (matt@box.li)
Date: Fri Oct 06 2000 - 01:42:59 EST


my apologies for the cross-posting ... but since that matter is urgent for
me and noone on the smp mailinglist have answered so far, i hope that i
will find somebody on this list who can give me advice :)

thanks!

---------- Forwarded Message ----------
Date: 10/05/00 00:19:53 +0200
From: Matthias Weidle <matt@box.li>
To: linux-smp@vger.kernel.org
Subject: OOPS after upgrading CPU's ...

hi there!

there is some strange stuff going on here and after checking all sources of
information (without success) i hope that one of you may have the answer
... :)

ok, here is the problem:

i'm running a smp server machine (mostly doing file server stuff) which was
running pretty stable with 2 celeron-400 cpu's. i got about 60 days uptime
without problems - even under heavy load! a few weeks ago i decided to
upgrade the celeron cpu's to some older p3's (those with 512kb cache, no
coppermine) and did not expect any complications with that upgrade. but
since then i can't get the machine up for more than a couple of days
(depending on the load). sooner or later it locks with the following kernel
oops message:

ksymoops 2.3.4 on i686 2.2.15pre19ext3. Options used
      -v /usr/src/linux/vmlinux (specified)
      -k /proc/ksyms (default)
      -l /proc/modules (default)
      -o /lib/modules/2.2.15pre19ext3/ (default)
      -m /usr/src/linux/System.map (default)

Warning (compare_ksyms_lsmod): module i2c-isa is in lsmod but not in ksyms,
probably no symbols exported Warning (compare_ksyms_lsmod): module
i2c-piix4 is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module nfsd is in lsmod but not in ksyms,
probably no symbols exported Warning (compare_ksyms_lsmod): module w83781d
is in lsmod but not in ksyms, probably no symbols exported Unable to handle
kernel NULL pointer dereference at virtual address 00000013
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 1
EIP: 0010:[<c01100c5>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010006
eax: 00000013 ebx: 00000260 ecx: cc100480 edx: cbffa000
esi: cbffa000 edi: 00000013 ebp: cbffbf74 esp: cbffbf4c
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 1, stackpage=cbffb000)
Stack: 00000013 c01104c1 00000013 cbffbf7c cbffa000 c0226020 c010b850
00000013 cbffbf7c cbffa000 00000000 c010a328 cbffa000 cbffa000 cbffa000
        cbffa000 c0226020 00000000 00000080 00000018 cbff0018 ffffff13
        c0107b15 00000010 Call Trace: [<c01104c1>] [<c010b850>]
[<c010a328>] [<c0107b15>] [<c019c875>] [<c01166b7>]
Code: e0 28 21 c0 8b 04 85 e4 28 21 c0 83 f8 ff 74 53 bf 00 e0 ff

>>EIP; c01100c5 <mask_IO_APIC_irq+d/84> <=====
Trace; c01104c1 <do_level_ioapic_IRQ+21/98>
Trace; c010b850 <do_IRQ+38/58>
Trace; c010a328 <common_interrupt+18/20>
Trace; c0107b15 <cpu_idle+3d/50>
Trace; c019c875 <vt_console_print+2fd/314>
Trace; c01166b7 <printk+177/184>
Code; c01100c5 <mask_IO_APIC_irq+d/84>
00000000 <_EIP>:
Code; c01100c5 <mask_IO_APIC_irq+d/84> <=====
    0: e0 28 loopne 2a <_EIP+0x2a> c01100ef
    <mask_IO_APIC_irq+37/84> <===== Code; c01100c7 <mask_IO_APIC_irq+f/84>
    2: 21 c0 andl %eax,%eax
Code; c01100c9 <mask_IO_APIC_irq+11/84>
    4: 8b 04 85 e4 28 21 c0 movl 0xc02128e4(,%eax,4),%eax
Code; c01100d0 <mask_IO_APIC_irq+18/84>
    b: 83 f8 ff cmpl $0xffffffff,%eax
Code; c01100d3 <mask_IO_APIC_irq+1b/84>
    e: 74 53 je 63 <_EIP+0x63> c0110128
    <mask_IO_APIC_irq+70/84> Code; c01100d5 <mask_IO_APIC_irq+1d/84>
   10: bf 00 e0 ff 00 movl $0xffe000,%edi

Kernel panic: Attempted to kill the idle task!

4 warnings issued. Results may not be reliable.

there have been 4-5 lockups since the upgrade and it was always the same
oops message.

for the record some additional data about the server box:

soltek sl-68a dual slot1 motherboard (with latest h4 bios)
2 p3-550 with 512kb cache
promise udma66 controler
intel etherexpress nic
64 + 128 mb ram (pc100)
6 hdd's (maxtor and ibm drives)

kernel: 2.2.15pre20 (thats pretty much 2.2.16 i guess)
+ ide patch
+ ext3 patch
+ ppdd patch

if you need any additional data please don't hesitate to contact me for
that!

is it really possible to break the stability of a box by simply upgrading
to a better cpu? my first idea was bad ram ... because it is running at 100
mhz now (66 with the celerons). but then i realized that this would be
pretty unlikely considering the same oops message all the time.

is there somebody out there who can help me?

best regards,
matt.

---------- End Forwarded Message ----------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:18 EST