Re: list corruption on ib_srp load in v2.6.24-rc5

From: David Dillow
Date: Thu Dec 27 2007 - 12:53:50 EST



On Thu, 2007-12-27 at 11:58 +0900, FUJITA Tomonori wrote:
> On Wed, 26 Dec 2007 12:14:11 -0500
> David Dillow <dillowda@xxxxxxxx> wrote:
>
> >
> > On Sun, 2007-12-23 at 01:41 +0900, FUJITA Tomonori wrote:
> > > transport_container_unregister(&i->rport_attr_cont) should not fail here.
> > >
> > > It fails because there is still a srp rport.
> > >
> > > I think that as Pete pointed out, srp_remove_one needs to call
> > > srp_remove_host.
> > >
> > > Can you try this?
> >
> > That patched oopsed in scsi_remove_host(), but reversing the order has
> > survived over 500 insert/probe/remove cycles.
>
> Thanks,
>
> Can you post the oops message? The srp class might have bugs related
> to it.

This is the oops generated by doing srp_remove_host() prior to
scsi_remove_host() in 2.6.24-rc5:

Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP:
[<ffffffff811d058d>] klist_del+0xa/0x46
PGD 8450d8067 PUD 843cbd067 PMD 0
Oops: 0000 [1] SMP
CPU 3
Modules linked in: sg sd_mod ib_iser libiscsi scsi_transport_iscsi rdma_ucm ib_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_mod ib_cm ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core ehci_hcd ohci_hcd nfs lockd nfs_acl sunrpc unionfs forcedeth
Pid: 2450, comm: rmmod Not tainted 2.6.24-rc5 #2
RIP: 0010:[<ffffffff811d058d>] [<ffffffff811d058d>] klist_del+0xa/0x46
RSP: 0018:ffff81084192bd28 EFLAGS: 00010282
RAX: ffff81084600b000 RBX: 0000000000000000 RCX: ffffe2001ce562c8
RDX: 0000000000000000 RSI: ffff810447c1d000 RDI: ffff81084657f050
RBP: ffff81084657f028 R08: ffff810447c1d000 R09: ffff8108455a1800
R10: ffff8108455a1800 R11: ffff810846730808 R12: ffff81084657f050
R13: ffff810844c4a170 R14: ffff81084657f028 R15: 0000000000000880
FS: 00002afbf1b0b6e0(0000) GS:ffff810846531840(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 0000000843c56000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 2450, threadinfo ffff81084192a000, task ffff810844d47620)
Stack: ffff810844c4a000 ffff81084657f028 ffff81084657f000 ffffffff8114cbd6
ffff810846730808 ffff810844c4a000 ffff81084657f028 ffff81084657f000
0000000000000246 ffffffff88118322 ffff8108455a1800 ffff81084657f000
Call Trace:
[<ffffffff8114cbd6>] device_del+0x20/0x2f0
[<ffffffff88118322>] :scsi_mod:scsi_target_reap_usercontext+0x53/0xbd
[<ffffffff810455ce>] execute_in_process_context+0x20/0x47
[<ffffffff8811a4da>] :scsi_mod:scsi_device_dev_release_usercontext+0xd3/0x105
[<ffffffff810455ce>] execute_in_process_context+0x20/0x47
[<ffffffff810ed9b8>] kobject_cleanup+0x2f/0x51
[<ffffffff810ed9da>] kobject_release+0x0/0x9
[<ffffffff810ee692>] kref_put+0x74/0x82
[<ffffffff88119f02>] :scsi_mod:scsi_forget_host+0x53/0x55
[<ffffffff88112018>] :scsi_mod:scsi_remove_host+0x76/0xf7
[<ffffffff8813d161>] :ib_srp:srp_remove_one+0x102/0x19d
[<ffffffff880ac2bc>] :ib_core:ib_unregister_client+0x40/0xb3
[<ffffffff8813d20a>] :ib_srp:srp_cleanup_module+0xe/0x34
[<ffffffff810551f1>] sys_delete_module+0x18d/0x1bc
[<ffffffff811d3879>] error_exit+0x0/0x51
[<ffffffff8100be6e>] system_call+0x7e/0x83


Code: 48 8b 6b 20 48 89 df e8 b7 2f 00 00 4c 89 e7 e8 d2 ff ff ff
RIP [<ffffffff811d058d>] klist_del+0xa/0x46
RSP <ffff81084192bd28>
CR2: 0000000000000020


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/