PROBLEM: Oops 2 with 2.2.14/smp on new HP hardware

From: John Shilling (john@jsacomp.com)
Date: Sun Mar 26 2000 - 08:54:23 EST


Hello,

>From PROBLEM: report form

[1.] Oops 2 error condition occurs in clock thread of a multi-threaded
     emulator application when running on new HP hardware. The problem
     is not seen with older hardware.

[2.] The clock thread is used to generate a periodic interrupt to the
     emulated hardware. The same application, E11, (see www.dbit.com)
     runs on many other machines, but only the new test machines see this
     failure. The hardware in question is the HP Netserver LPr 700. It
     looks the same as the Netserver 550 (we have many of these which
     run the same distribution with no errors) but the LPr 700 has
     Coppermine and the mainboard is a "new etch", according to HP. We
     speculate that this problem is timing-related (no pun intended) or
     hardware-related. This machine is probably not out in the general
     market yet. If needs be, we can supply one.

     The most annoying thing is that these machines will run under hard
     load for days or weeks before this problem occurs, but once it
     happens, E11 is basically fried.

[3.] Keywords: kernel; memory management.

[4.] Version: 2.2.14/smp originally from Red Hat 6.1 distribution

[5.] Oops:

-----------------------------------------------
ksym-oops output
-----------------------------------------------

bash# ./ksymoops -v ../linux/vmlinux messages
ksymoops 2.3.3 on i686 2.2.14. Options used
     -v ../linux/vmlinux (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.2.14/ (default)
     -m /usr/src/linux/System.map (default)
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?

Unable to handle kernel paging request at virtual address 0400e3d0
current->tss.cr3 = 0f2da000, %cr3 = 0f2da000
*pde = 00000000
Oops: 0002
CPU: 1
EIP: 0010:[<c0118d43>]
Using defaults from ksymoops -t elf32-i386 -a i386

EFLAGS: 00010246
eax: 00000000 ebx: 0400e3d8 ecx: 00000002 edx: 00000000
esi: cf03dfb8 edi: 0400e3d0 ebp: 00000000 esp: cf03dfac
ds: 0018 es: 0018 ss: 0018

Process mar15d (pid: 524, process nr: 25, stackpage=cf03d000)

Stack: 0809e270 0805f660 c025ac20 38db2f0a 000b2908 c0107a50 0400e3d0 00000000
       00000000 000091b4 0809e270 0805f660 0000004e 00000057 0000002b 0000004e
       0808b23b 00000023 00000202 080a13f0 0000002b

Call Trace: [<c0107a50>]
Code: f3 a5 8d 76 00 85 c9 75 34 85 ed 74 38 b9 08 00 00 00 b8 00

>>EIP; c0118d43 <sys_gettimeofday+43/94> <=====
Trace; c0107a50 <system_call+34/38>
Code; c0118d43 <sys_gettimeofday+43/94>
00000000 <_EIP>:
Code; c0118d43 <sys_gettimeofday+43/94> <=====
   0: f3 a5 repz movsl %ds:(%esi),%es:(%edi) <=====
Code; c0118d45 <sys_gettimeofday+45/94>
   2: 8d 76 00 leal 0x0(%esi),%esi
Code; c0118d48 <sys_gettimeofday+48/94>
   5: 85 c9 testl %ecx,%ecx
Code; c0118d4a <sys_gettimeofday+4a/94>
   7: 75 34 jne 3d <_EIP+0x3d> c0118d80 <sys_gettimeofday+80/94>
Code; c0118d4c <sys_gettimeofday+4c/94>
   9: 85 ed testl %ebp,%ebp
Code; c0118d4e <sys_gettimeofday+4e/94>
   b: 74 38 je 45 <_EIP+0x45> c0118d88 <sys_gettimeofday+88/94>
Code; c0118d50 <sys_gettimeofday+50/94>
   d: b9 08 00 00 00 movl $0x8,%ecx
Code; c0118d55 <sys_gettimeofday+55/94>
  12: b8 00 00 00 00 movl $0x0,%eax

1 warning issued. Results may not be reliable.

bash# ./ksymoops -v ../linux/vmlinux messages > report

[6.] The system runs an emulator. The emulated code dictates what the
     emulator does... not possible to provide a sample because the
     affected machine will run for a long time before this happens. The
     clock processing thread itself is quite simple.

[7.1] See also: www.dbit.com. The application running on the emulated
     hardware is network-intensive, but it runs DECnet, not TCP/IP. The
     emulator uses Linux's drivers to get/put Ethernet frames to/from
     drivers running under the emulated machine's O/S, which is RSX-11M/
     PLUS, late of DEC, now owned by Mentec (see: www.mentec.com). The
     emulator effectively uses SMP by placing disk, network, and PDP-11
     CPU in different threads. We're looking forward to the next stable
     SMP release :-)

[7.2] Processor Info

--------------
bash# cat cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 1
cpu MHz : 698.820797
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 pn mmx fxsr xmm
bogomips : 697.96

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 1
cpu MHz : 698.820797
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 pn mmx fxsr xmm
bogomips : 697.96

---------

[7.3] Modules: Supported, none loaded

[7.4] SCSI information: The LPr has a Symbios controller. Our kernel has
      support for Adaptec 78xx and Symbios 53c8xx. Emulated disks are on
      raw ("unformatted") partitions.

[7.5] Other information

      Details about the emulator would come from John Wilson:

                wilson@dbit.dbit.com

      The machines discussed are not running any graphic software. The
      emulated machine's (PDP-11) console is running off serial port A
      assigned to /dev/ttyS0. The Linux console is idle, with no telnet
      access (network running under RSX-11M/Plus's DECnet implementation).

     After failure there are two remaining E11 threads in this list,
     the mar15d ones; the main emulator is QUIT out:

--------------

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1104 72 ? S Mar17 0:11 init [3]
root 2 0.0 0.0 0 0 ? SW Mar17 0:05 [kflushd]
root 3 0.0 0.0 0 0 ? SW Mar17 0:49 [kupdate]
root 4 0.0 0.0 0 0 ? SW Mar17 0:00 [kpiod]
root 5 0.0 0.0 0 0 ? SW Mar17 0:15 [kswapd]
bin 274 0.0 0.0 1196 0 ? SW Mar17 0:00 [portmap]
root 329 0.0 0.0 1152 56 ? S Mar17 0:01 syslogd -m 0
root 340 0.0 0.0 1396 228 ? S Mar17 0:00 klogd
daemon 356 0.0 0.0 1128 104 ? S Mar17 0:00 /usr/sbin/atd
root 372 0.0 0.0 1304 108 ? S Mar17 0:01 crond
root 392 0.0 0.0 1124 0 ? SW Mar17 0:00 [inetd]
root 408 0.0 0.0 1176 0 ? SW Mar17 0:00 [lpd]
root 446 0.0 0.1 2104 284 ? S Mar17 0:03 sendmail: accepti
root 463 0.0 0.0 1132 136 ? S Mar17 0:00 gpm -t ps/2
xfs 480 0.0 0.0 1880 60 ? S Mar17 0:00 xfs -droppriv -da
root 514 0.0 0.2 1704 648 tty1 S Mar17 0:00 /bin/bash --
root 515 0.0 0.0 1696 0 tty2 SW Mar17 0:00 [bash]
root 516 0.0 0.0 1076 0 tty3 SW Mar17 0:00 [mingetty]
root 517 0.0 0.0 1076 0 tty4 SW Mar17 0:00 [mingetty]
root 518 0.0 0.0 1076 0 tty5 SW Mar17 0:00 [mingetty]
root 519 0.0 0.0 1076 0 tty6 SW Mar17 0:00 [mingetty]
root 520 0.0 0.1 1672 268 ttyS0 S Mar17 0:00 bash /etc/starte1
root 526 0.1 1.7 4720 4488 ttyS0 S< Mar17 18:35 [mar15d]
root 527 0.0 1.7 4720 4488 ttyS0 S< Mar17 1:06 [mar15d]
root 3917 0.0 0.3 1696 896 ttyS0 S 07:24 0:00 bash
root 3921 0.0 0.3 2672 960 ttyS0 R 07:25 0:00 ps -aux
bash#

[9] Other

    Please inform if HP Engineering should be involved.

Thank you,

John Shilling
System Consultant
5404 Knoxville Drive
College Park, MD 20740
(301) 441-2357

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:17 EST