kernel errors/code dumps

Noah L. Meyerhans (frodo@ccs.neu.edu)
Mon, 5 Jan 1998 13:40:13 -0500 (EST)


Hi all! I've got some kernel problems, that have been appearing since I
upgraded my computer to an AMD K6/200 on a TX chipset based motherboard.
Similar errors have occured under kernel versions 2.0.31, .32, and .33
(I'm currently using 2.0.33). Here is the kernel oops message that gets
logged:

Jan 4 06:50:19 wintermute kernel: Unable to handle kernel pagin
g request at virtual address d1790443
Jan 4 06:50:19 wintermute kernel: current->tss.cr3 = 00101000,
^_r3 = 00101000
Jan 4 06:50:19 wintermute kernel: *pde = 00000000
Jan 4 06:50:19 wintermute kernel: Oops: 0002
Jan 4 06:50:19 wintermute kernel: CPU: 0
Jan 4 06:50:19 wintermute kernel: EIP: 0010:[update_process_
times+34/280]
Jan 4 06:50:19 wintermute kernel: EFLAGS: 00010202
Jan 4 06:50:19 wintermute kernel: eax: 00096773 ebx: 001b7c45
ecx: 00000000 edx: 00002710
Jan 4 06:50:19 wintermute kernel: esi: 00000001 edi: 00000100
ebp: 001b73a1 esp: 001b7388
Jan 4 06:50:19 wintermute kernel: ds: 0018 es: 0018 fs: 002
b gs: 0018 ss: 0018
Jan 4 06:50:19 wintermute kernel: Process swapper (pid: 256, pr
ocess nr: 0, stackpage=001b5490)
Jan 4 06:50:19 wintermute kernel: Stack: 00000001 00000001 0000
0001 0025e218 00000000 001b73a8 001b73c8 001122c5
Jan 4 06:50:19 wintermute kernel: 00000001 00000001 0000
0001 00000001 ffffffff 00000001 00000001 001b73e4
Jan 4 06:50:19 wintermute kernel: 001dc200 00117d0b 001b
73e4 001b746c 00000000 00009000 0010a5db 0030e1de
Jan 4 06:50:19 wintermute kernel: Call Trace: [timer_bh+193/820
] [do_bottom_half+59/96] [handle_bottom_half+11/32] [sys_idle+92
/112] [system_call+85/128] [init+0/616] [start_kernel+429/440]
Jan 4 06:50:19 wintermute kernel: [it_real_fn+0/72] [sch
edule+564/652]
Jan 4 06:50:19 wintermute kernel: Code: 08 89 43 04 79 11 c7 43
04 00 00 00 00 c7 05 7c 54 1b 00 01
Jan 4 06:50:19 wintermute kernel: Aiee, killing interrupt handl
er
Jan 4 06:50:19 wintermute kernel: kfree of non-kmalloced memory
: 001b74d8, next= 00000000, order=0
Jan 4 06:50:19 wintermute kernel: kfree of non-kmalloced memory
: 001b74c8, next= 00000000, order=0
Jan 4 06:50:19 wintermute kernel: kfree of non-kmalloced memory
: 001b79dc, next= 00000000, order=0
Jan 4 06:50:19 wintermute kernel: idle task may not sleep

Some of the details are different from time to time. For example, the
EFLAGS value is sometimes 00010216. Also, the first 4 lines of the error
log are often replaced by "general protection: 0000".

I haven't been able to locate the EIP value, as was suggested in
/usr/src/linux/README, because I don't know where to find the value of
update_process_times. I have found the address mentioned on the line:
Jan 4 06:50:19 wintermute kernel: current->tss.cr3 = 00101000, found near
the top of that code dump. The entry in System.map is:
00101000 T swapper_pg_dir. Here are some of the surrounding lines, in
case you want more of the context:

0010017d t L6
0010017f t check_x87
001001aa t setup_idt
001001c7 t rp_sidt
001001e0 t setup_paging
00101000 T swapper_pg_dir
00102000 T pg0
00103000 T empty_bad_page
00104000 T empty_bad_page_table
00105000 T empty_zero_page
00106000 t stack_start

I attempted to use the ksymoops program, but it didn't seem to work right.
Aside from the confirmation of the map file that was being used, it didn't
output anything. I have not tried to debug it.

These errors seem to appear completely randomly, but there are 2 places
where they occur more frequently. These are both at boot time, but at
different stages of the boot process. One is when the hlt instruction is
being checked. The message normally looks like:
Checking 'hlt' instruction...Ok.

However, sometimes, instead of saying Ok, the kernel dumps code. The boot
process does not continue, and the machine needs to be powered down and
rebooted.

Other times, the error appears when the kernel is doing its partition
check on /dev/hdc, an EIDE hard disk; the only one in my system.
Sometimes the kernel will continue with the boot sequence, but not always.

The errors will occasionally occur when the system is up and running. I
have not noticed any way to reproduce the errors on demand, though. For
example, they will often occur over night, when nothing is running. Cron
is not running anything, and the machine seems completely idle. When this
happens, the /proc filesystem becomes unavailable. I don't know the exact
error message that's generated when I try to access it. I will make a
note of it next time it occurs, though, and post it if necessary.

Do you have any idea what might be causing these problems? If you have
any suggestions for isolating the problem, please let me know. If you
need more info about my configuration, I will send it to you. Thanks for
your help!

Noah

PGP public key available at
http://lynx.dac.neu.edu/home/httpd/n/nmeyerha/mail.html
or by 'finger -l frodo@ccs.neu.edu'