Re: OOPS during FC-aware-driver module reload...

From: Russell King
Date: Wed May 10 2006 - 15:04:46 EST


On Wed, May 10, 2006 at 11:31:44AM -0700, Andrew Vasquez wrote:
> Recent linux-2.6.git trees introduced some oddities while performing a
> simple load/unload/load of the qla2xxx (a FC transport aware driver)
> driver.
>
> Basically on module reload, I'd consistently hit the following
>
> # insmod qla2xxx
> # rmmod qla2xxx
> # insmod qla2xxx
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip: 00000000
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> Modules linked in: qla2xxx scsi_transport_fc
> CPU: 1
> EIP: 0060:[<00000000>] Not tainted VLI
> EFLAGS: 00010246 (2.6.17-rc3 #32)
> EIP is at _stext+0x3feffd68/0x23
> eax: 00000000 ebx: f7c6c448 ecx: f3e92128 edx: 00000000
> esi: f3e92128 edi: c02fe420 ebp: 6b6b6b6b esp: ea290dcc
> ds: 007b es: 007b ss: 0068
> Process insmod (pid: 10001, threadinfo=ea290000 task=f40ada90)
> Stack: <0>c0226794 6b6b6b6b f3e92128 f8b216e0 f3e92128 f3e92000 00000000 000000d0
> c0226cc3 f3e92128 c0226c90 c02598e4 f3e92128 f3e920f1 f3e92000 c02520c7
> f3e92000 ffff5e05 f3e92358 f3e93b14 f3e92000 f8b04d29 f3e92000 c1b2eb5c
> Call Trace:
> <c0226794> attribute_container_add_device+0x4b/0x135 <c0226cc3> transport_setup_device+0xe/0x11
> <c0226c90> transport_setup_classdev+0x0/0x25 <c02598e4> scsi_sysfs_add_host+0x9e/0xac
> <c02520c7> scsi_add_host+0x129/0x179 <f8b04d29> qla2x00_probe_one+0xa33/0xb34 [qla2xxx]
> <c0125eb3> call_usermodehelper_keys+0xf7/0x104 <c0125d78> __call_usermodehelper+0x0/0x44
> <f8b05e1b> qla2xxx_probe_one+0xe/0x11 [qla2xxx] <c01e4ae6> pci_call_probe+0xf/0x12
> <c01e4b1c> __pci_device_probe+0x33/0x47 <c01e4b4f> pci_device_probe+0x1f/0x34
> <c0223ec4> driver_probe_device+0x43/0xa4 <c0223f95> __driver_attach+0x0/0x84
> <c0223fee> __driver_attach+0x59/0x84 <c02235aa> bus_for_each_dev+0x47/0x6d
> <c01dc91e> kobject_add+0xae/0xf7 <c022402d> driver_attach+0x14/0x18
> <c0223f95> __driver_attach+0x0/0x84 <c02239d1> bus_add_driver+0x57/0x8d
> <c02244b6> driver_register+0xb9/0xbe <c01e4d4e> __pci_register_driver+0x85/0x96
> <f883c07b> qla2x00_module_init+0x7b/0xa2 [qla2xxx] <c01308c1> sys_init_module+0x8d/0x171
> <c010266f> sysenter_past_esp+0x54/0x75
> Code: Bad EIP value.
> EIP: [<00000000>] _stext+0x3feffd68/0x23 SS:ESP 0068:ea290dcc
>
> after some churning, git-bisect pointed to commit:
>
> commit 56cf6504fc1c0c221b82cebc16a444b684140fb7
> Author: Russell King <rmk@xxxxxxxxxxxxxxxxxxxxxxx>
> Date: Fri May 5 17:57:52 2006 +0100
>
> [BLOCK] Fix oops on removal of SD/MMC card
>
> The block layer keeps a reference (driverfs_dev) to the struct
> device associated with the block device, and uses it internally
> for generating uevents in block_uevent.
>
> Block device uevents include umounting the partition, which can
> occur after the backing device has been removed.
>
> Unfortunately, this reference is not counted. This means that
> if the struct device is removed from the device tree, the block
> layers reference will become stale.
>
> Guard against this by holding a reference to the struct device
> in add_disk(), and only drop the reference when we're releasing
> the gendisk kobject - in other words when we can be sure that no
> further uevents will be generated for this block device.
>
> Signed-off-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxxx>
> Acked-by: Jens Axboe <axboe@xxxxxxx>
>
> after reverting the commit, all is fine again (no more panics during
> reload). My fear though is that we're simply masking an underlying
> problem which the commit attempted to correct.

Not sure why that would be, but I think your assumption is correct (that
this fix has uncovered something else which is wrong.)

I'm not sure what because I don't do well when trying to decode x86
oopsen. In fact, I'm not sure how it's getting from
attribute_container_add_device() to a NULL pointer.

The obvious thing is this:

if (!cont->match(cont, dev))
continue;

but that means that "cont" must be invalid. I don't know this code.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/