1.3.72 mouse module CRASH (again...) (longish)

Chris Evans (chris@jcr00.lmh.ox.ac.uk)
Sat, 16 Mar 1996 07:11:35 +0000 (GMT)


Right, I've been trying to narrow down an annoying oops I see with the
following combination:

Kernel 1.3.72 (plus a few previous)
mouse.o, psaux.o loaded when needed by kerneld
gpm -t ps2 run at bootup
X running from time to time (gpm still running)

Note that X doesn't seem to care at all that gpm already accesses the
mouse, and gpm runs happily with X going too.

I used the "insmod -m" command to load mouse.o and psaux.o with a dump of
their symbol tables.... but then this happened...

jcr00:~$ uptime
3:59am up 2 days, 5:15, 1 user, load average: 0.08, 0.14, 0.22

As in, a much longer uptime than usually seen with *no* problems, despite
repeated starting/exiting X and gpm.
So I killed X + gpm, unloaded the modules, to let kerneld take care of them
automatically once more. I re-ran gpm, then X, and within minutes.....

Mar 16 03:55:17 jcr00 kernel: PS/2 auxiliary pointing device detected -- driver installed.
Mar 16 03:56:16 jcr00 kernel: Unable to handle kernel paging request at virtual address c100d583
Mar 16 03:56:16 jcr00 kernel: current->tss.cr3 = 0043c000,
Mar 16 03:56:16 jcr00 kernel: *pde = 00001067
Mar 16 03:56:16 jcr00 kernel: *pte = 00000000
Mar 16 03:56:16 jcr00 kernel: Oops: 0000
Mar 16 03:56:16 jcr00 kernel: CPU: 0
Mar 16 03:56:16 jcr00 kernel: EIP: 0010:[<0100d583>]
Mar 16 03:56:16 jcr00 kernel: EFLAGS: 00010246
Mar 16 03:56:16 jcr00 kernel: eax: 00000000 ebx: 00000004 ecx: 00702f24 edx: 00000000
Mar 16 03:56:16 jcr00 kernel: esi: 005af180 edi: 003027c0 ebp: bffffa48 esp: 00702f40
Mar 16 03:56:16 jcr00 kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Mar 16 03:56:16 jcr00 kernel: Process gpm (pid: 18499, process nr: 22, stackpage=00702000)
Mar 16 03:56:16 jcr00 kernel: Stack: 003027c0 0100d025 0100d1e6 00000047 000000f4 0100b040 003027c0 005af180
Mar 16 03:56:16 jcr00 kernel: 00000000 001217dd 003027c0 005af180 005af180 00000000 0031b000 0011fd76
Mar 16 03:56:16 jcr00 kernel: 003027c0 005af180 00000000 00000000 00000002 003027c0 0011fe2e 0031b000
Mar 16 03:56:16 jcr00 kernel: Call Trace: [<0100d025>] [<0100d1e6>]
[<0100b040>] [<001217dd>] [<0011fd76>] [<0011fe2e>] [<0010a469>]
Mar 16 03:56:16 jcr00 kernel: Code: <1>Unable to handle kernel paging request at virtual address c100d583

(code lost, see next exceedingly similar oops)

The call trace is interesting, the modules were loaded by kerneld VERY SOON
after I manually unloaded them. I assumed they *might* have been loaded in
exactly the same place, so maybe my symbol map produced earlier with
"insmod -m" was still valid. Indeed, using it, a *very plausible* call
trace is obtained. (note that if the module were loaded at only a
slightly different address, the call trace would be expected to be
garbage / make no sense)

EIP: 100d583 = poll_aux_status (see assumption above)
100d025 = aux_write_cmd "
100d1e6 = open_aux " }
100b040 = mouse_open " } - These seem highly
plausible
1217dd = chrdev_open }
11fd76 = do_open }
11fe2e = sys_open } Directly from System.map
10a469 = system_call }

Here's another oops but with code intact.... (call trace identical apart
from going through die_if_kernel etc. see my previous posting with
identical oopses.)

Mar 16 03:56:16 jcr00 kernel: current->tss.cr3 = 0043c000,
Mar 16 03:56:16 jcr00 kernel: *pde = 00001067
Mar 16 03:56:16 jcr00 kernel: *pte = 00000000
Mar 16 03:56:16 jcr00 kernel: Oops: 0000
Mar 16 03:56:16 jcr00 kernel: CPU: 0
Mar 16 03:56:16 jcr00 kernel: EIP: 0010:[<0010a9eb>]
Mar 16 03:56:16 jcr00 kernel: EFLAGS: 00010212
Mar 16 03:56:16 jcr00 kernel: eax: 00000010 ebx: 0009002b ecx: 0100d583 edx: 006fa414
Mar 16 03:56:16 jcr00 kernel: esi: 00000000 edi: 00703000 ebp: 00702f04 esp: 00702eac
Mar 16 03:56:16 jcr00 kernel: ds: 0018 es: 0018 fs: 0010 gs: 002b ss: 0018
Mar 16 03:56:16 jcr00 kernel: Process gpm (pid: 18499, process nr: 22, stackpage=00702000)
Mar 16 03:56:16 jcr00 kernel: Stack: 0017002b 00000000 0000d000 00702f04 00530000 01000000 01800000 01000000
Mar 16 03:56:16 jcr00 kernel: 00530018 0010fa13 00175c04 00702f04 00530000 0010f770 005af180 003027c0
Mar 16 03:56:16 jcr00 kernel: bffffa48 00000003 0009e000 0010a61b 00702f04 00530000 00000004 00702f24
Mar 16 03:56:16 jcr00 kernel: Call Trace: [<0017002b>] [<01000000>] [<01800000>] [<01000000>] [<0010fa13>] [<0010f770>] [<0010a61b>]
Mar 16 03:56:16 jcr00 kernel: [<0100d583>] [<0100d025>] [<0100d1e6>] [<0100b040>] [<001217dd>] [<0011fd76>] [<0011fe2e>] [<0010a469>]
Mar 16 03:56:16 jcr00 kernel: Code: 64 8a 04 0e 0f a1 88 c2 81 e2 ff 00 00 00 89 54 24 10 52 68
Mar 16 03:58:00 jcr00 kernel: PS/2 auxiliary pointing device detected -- driver installed.
Mar 16 04:02:04 jcr00 kernel: PS/2 auxiliary pointing device detected -- driver installed.
Mar 16 04:03:04 jcr00 kernel: PS/2 auxiliary pointing device detected -- driver installed.

Other information:

- Once the crash has occured, /dev/mouse reports "device busy" when
trying to unload the module psaux.o. This is despite X and gpm NOT running.

- When it is X that OOPSES, the X server (after generating OOPS) reports
that it cannot open (/dev/mouse): device/resource busy.

- There is a slight degree of repeatability to this; if I mess around for
10 minutes, running gpm/X and killing them at random, the oops can
usually be repeated. But only, it seems, _if kerneld was the mechanism to
load them_ (re: stability with both modules loaded via "insmod" not kerneld)

- In general I can make the oops more likely by letting the module
load/unload a few times (ie wait for kerneld 1 minute timeout after
killing X + gpm)

Here's the symbol map "insmod -m" initially generated, from which I got
my call trace (ok, big assumption I know)

0100b004
0100b004 mouse.c
0100b004 gcc2_compiled.
0100b144 mouse_list
0100b004 mouse_open
0100b158 mouse_fops
0100b18c mouse_syms
0100b13c
0100b1b8
0100b004
0100b004
0100b004
0100b20d
0100b13c kernel_version
0100b058 mouse_register
0100b000 mod_use_count_
0100b094 mouse_deregister
0100b0d8 cleanup_module
0112c5cc unregister_chrdev
0100b0e8 init_module
0112c4c4 register_chrdev
0111cec0 printk
0111f378 register_symtab
0100d004
0100d004 psaux.c
0100d004 gcc2_compiled.
0100d5d8 aux_ready
0100d5dc aux_count
0100d5e0 aux_present
0100d004 aux_write_dev
0100d540 poll_aux_status
0100d020 aux_write_cmd
0100d03c get_from_queue
0100d690 queue
0100d060 aux_interrupt
0100d0e0 release_aux
0100d130 fasync_aux
0100d160 open_aux
0100d210 write_aux
0100d26c read_aux
0100d3fc aux_select
0100d618 psaux_mouse
0100d598 poll_aux_status_nosleep
0100d004
0100d690
0100d5d0
0100d62c
0100d004
0100d67c
0100d5d0 kernel_version
01179858 add_mouse_randomness
011375dc kill_fasync
0111ce90 wake_up_interruptible
01119924 free_irq
0100d000 mod_use_count_
0116e160 fasync_helper
01119718 request_irq
011a6c90 xtime
011a6c98 current_set
0111cb04 schedule
0100d5e4 psaux_fops
0100d484 psaux_init
011a6954 aux_device_present
0111eec0 printk
01195bb8 kbd_read_mask
0201805c mouse_register
01129508 kmalloc
0100d528 init_module
0100d530 cleanup_module
02018098 mouse_deregister
01191500 jiffies

Other things that might be buggy in 1.3.7x:

1) Slight keyboard problem - very occasionally when pressing delete and
holding it, a '#' sign will appear and the keyrepeat stop until I release
the key. This will then persist the next few times I try the delete key,
then start working fine again. I've seen this in both X and at the console.

2) Font problem...? Since 1.3.60ish, I'm _sure_ my fonts are squashed
together on the screen, ie. consecutive lowercase characters "touch". I'll
boot on 1.2.13 in a moment, and check...

That's it folks... hope this long mail will provide more info to that
annoying OOPS.

-- Chris