Probable module bug in linux-2.6.5-1.358

From: Richard B. Johnson
Date: Wed Oct 06 2004 - 17:20:19 EST



The attached script shows that an attempt to open a device
after its module was removed, will seg-fault the kernel.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.5-1.358-noreg on an i686 machine (5537.79 BogoMips).
Note 96.31% of all statistics are fiction.


Kernel 2.6.5-1.358 from Red Hat Fedora has a bug that
causes a kernel oops when attempting to open a device
that had a module removed.

In the following, the first attempt at a user-mode open()
causes the module to be properly loaded by modprobe, the
module alias having been put into /etc/modprobe.conf.

The module can be removed and inserted many times, each
time the resources are deallocated and the module has no
problem obtaining those resources again when it's reloaded.

However, only the FIRST automatic load works. Any attempt
to access the device (open it) after it has been removed,
will not cause it to be reloaded. Instead the kernel will
seg-fault during a call to sys_open().

The virtual address 0x222e7880 is not in the symbol file and
seems to be the address near where the open() in my module
used to be before it was removed.

I can duplicate this and a printk("%p\n", open); shows
0x222e74a0, just after I boot. Subsequent loads show other
addresses.


Oct 6 17:03:30 chaos kernel: Analogic Corp Datalink Driver : Module removed
Oct 6 17:03:38 chaos kernel: Unable to handle kernel paging request at virtual address 222e7880
Oct 6 17:03:38 chaos kernel: printing eip:
Oct 6 17:03:38 chaos kernel: 021556ad
Oct 6 17:03:38 chaos kernel: *pde = 1f30c067
Oct 6 17:03:38 chaos kernel: *pte = 00000000
Oct 6 17:03:38 chaos kernel: Oops: 0000 [#1]
Oct 6 17:03:38 chaos kernel: SMP
Oct 6 17:03:38 chaos kernel: CPU: 0
Oct 6 17:03:38 chaos kernel: EIP: 0060:[<021556ad>] Not tainted
Oct 6 17:03:38 chaos kernel: EFLAGS: 00010206 (2.6.5-1.358-noreg)
Oct 6 17:03:38 chaos kernel: EIP is at cdev_get+0x14/0x6a
Oct 6 17:03:38 chaos kernel: eax: 20dee000 ebx: 222e7880 ecx: 1f564a80 edx: 13d19910
Oct 6 17:03:38 chaos kernel: esi: 00000000 edi: 1edae208 ebp: 00000000 esp: 20deef20
Oct 6 17:03:38 chaos kernel: ds: 007b es: 007b ss: 0068
Oct 6 17:03:38 chaos kernel: Process ftest (pid: 3728, threadinfo=20dee000 task=1f7a0130)
Oct 6 17:03:38 chaos kernel: Stack: 13d19910 00000000 021554bc 13d19910 1f564a80 1f564a80 1edae208 2138bf80
Oct 6 17:03:38 chaos kernel: 1d49ac8c 0214d443 1edae208 1f564a80 00000002 00000003 1685a000 20dee000
Oct 6 17:03:38 chaos kernel: 0214d364 20e18180 2138bf80 00000002 20e18180 2138bf80 00000ffc 00000001
Oct 6 17:03:38 chaos kernel: Call Trace:
Oct 6 17:03:38 chaos kernel: [<021554bc>] chrdev_open+0xb2/0x18b
Oct 6 17:03:38 chaos kernel: [<0214d443>] dentry_open+0xd7/0x19b
Oct 6 17:03:38 chaos kernel: [<0214d364>] filp_open+0x41/0x49
Oct 6 17:03:38 chaos kernel: [<0214d703>] sys_open+0x31/0x82
Oct 6 17:03:38 chaos kernel:
Oct 6 17:03:38 chaos kernel: Code: 83 3b 02 8b 40 10 74 0e c1 e0 07 8d 04 18 ff 80 00 01 00 00
Oct 6 17:08:11 chaos syslogd 1.4.1: restart.


Script started on Wed 06 Oct 2004 05:09:06 PM EDT
# sync

# ./ftest

Make sure a single board exists in a PCI slot and

channel A output is connected to channel B input.

Hit [Enter] to continue...

Enabling both RX and TX

Waiting for synchronization...

Waiting for synchronization...

Mailbox-A message = 0000000f

Mailbox-B message = 0000000f

Len = 6 Mb, time = 104468.00 us, rate = 64.334954 bytes/usec

[SNIPPED...]
Works...

PCI slot = 516

Driver version = V2.01

FPGA version = 61128V00R000

# ps laxw

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND

4 0 1 0 16 0 3124 464 - S ? 0:04 init [5]

[SNIPPED...]
4 0 3128 3127 15 0 6080 1276 wait4 S pts/0 0:00 bash -i

1 5418 3137 1 15 0 0 0 - SW ? 0:00 [DLB daemon]


This shows the kernel daemon that manages the link.

4 0 3140 3128 17 0 3824 584 - R pts/0 0:00 ps laxw

# cat /proc/iomem

00000000-0009fbff : System RAM

0009fc00-0009ffff : reserved

000a0000-000bffff : Video RAM area

000c0000-000cfbff : Video ROM

000d0000-000d0bff : Adapter ROM

000d1000-000d57ff : Adapter ROM

000f0000-000fffff : System ROM

00100000-1f3fffff : System RAM

00100000-002a8fff : Kernel code

002a9000-0034fa7f : Kernel data

1f400000-1f4003ff : 0000:00:1f.1

1f401000-22403fff : Analogic Corp Datalink Driver

e4500000-f45fffff : PCI Bus #01

e8000000-efffffff : 0000:01:00.0

f4580000-f45fffff : 0000:01:00.0

f8000000-fbffffff : 0000:00:00.0

fc700000-fe7fffff : PCI Bus #01

fd000000-fdffffff : 0000:01:00.0

fe900000-fe9fffff : 0000:02:04.0

fe900000-fe9fffff : Analogic Corp Datalink Driver

feafc000-feafcfff : 0000:02:08.0

feafc000-feafcfff : e100

feafd000-feafdfff : 0000:02:07.0

feafe800-feafe9ff : 0000:02:04.0

feafe800-feafe9ff : Analogic Corp Datalink Driver

feafec00-feafec7f : 0000:02:01.0

feaff000-feafffff : 0000:02:00.0

feaff000-feafffff : aic7xxx

febff400-febff4ff : 0000:00:1f.5

febff800-febff9ff : 0000:00:1f.5

febffc00-febfffff : 0000:00:1d.7

# sync


This shows resources used.

# rmmod HeavyLink


I remove the module okay.

# cat /proc/iomem

00000000-0009fbff : System RAM

0009fc00-0009ffff : reserved

000a0000-000bffff : Video RAM area

000c0000-000cfbff : Video ROM

000d0000-000d0bff : Adapter ROM

000d1000-000d57ff : Adapter ROM

000f0000-000fffff : System ROM

00100000-1f3fffff : System RAM

00100000-002a8fff : Kernel code

002a9000-0034fa7f : Kernel data

1f400000-1f4003ff : 0000:00:1f.1

e4500000-f45fffff : PCI Bus #01

e8000000-efffffff : 0000:01:00.0

f4580000-f45fffff : 0000:01:00.0

f8000000-fbffffff : 0000:00:00.0

fc700000-fe7fffff : PCI Bus #01

fd000000-fdffffff : 0000:01:00.0

fe900000-fe9fffff : 0000:02:04.0

feafc000-feafcfff : 0000:02:08.0

feafc000-feafcfff : e100

feafd000-feafdfff : 0000:02:07.0

feafe800-feafe9ff : 0000:02:04.0

feafec00-feafec7f : 0000:02:01.0

feaff000-feafffff : 0000:02:00.0

feaff000-feafffff : aic7xxx

febff400-febff4ff : 0000:00:1f.5

febff800-febff9ff : 0000:00:1f.5

febffc00-febfffff : 0000:00:1d.7



The resources are properly freed.

# ps laxw

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND

4 0 1 0 16 0 3124 464 - S ? 0:04 init [5]

[SNIPPED...]
4 42 3088 2989 16 0 20992 9148 - S ? 0:00 /usr/bin/gdmgreeter

4 0 3089 2624 15 0 4676 1336 wait4 S tty1 0:00 -bash

4 0 3126 3089 15 0 4632 464 - S tty1 0:00 script

5 0 3127 3126 15 0 4636 496 - S tty1 0:00 script

4 0 3128 3127 15 0 6080 1276 wait4 S pts/0 0:00 bash -i

4 0 3154 3128 17 0 2444 588 - R pts/0 0:00 ps laxw


The daemon is gone, properly exited.

# insmod HeavyLink.ko


Insert again. Works.

# ./ftest

Make sure a single board exists in a PCI slot and

channel A output is connected to channel B input.

Hit [Enter] to continue...

Enabling both RX and TX

Waiting for synchronization...

Waiting for synchronization...

Mailbox-A message = 0000000f

Mailbox-B message = 0000000f

Len = 8 Mb, time = 134457.00 us, rate = 64.362763 bytes/usec

Mailbox-B message = 00000000

[SNIPPED...]
Works fine.....


#

# lsmod | grep Heavy

HeavyLink 40456 0

# rmmod HeavyLink
a

Remove the module as before. Fine.


# sync

# ./ftest


Now, try to open the device without first manually inserting
the module.

Segmentation fault


DEATH with a console screen display that doesn't get saved anywhere.

The last call was sys_open()

# exit


Script done on Wed 06 Oct 2004 05:12:20 PM EDT