Re: [PATCH] mptsas: add mptsas_shutdown to call pci_disable_msi

From: Yinghai Lu
Date: Wed May 07 2008 - 19:41:46 EST


On Wed, May 7, 2008 at 4:31 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 22 Apr 2008 20:12:08 -0700
> "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> wrote:
>
> > On Tue, Apr 22, 2008 at 7:47 PM, Yinghai Lu <yhlu.kernel.send@xxxxxxxxx> wrote:
> > >
> > >
> > > this change
> > >
> > > | commit 23a274c8a5adafc74a66f16988776fc7dd6f6e51
> > > | Author: Prakash, Sathya <sathya.prakash@xxxxxxx>
> > > | Date: Fri Mar 7 15:53:21 2008 +0530
> > > |
> > > | [SCSI] mpt fusion: Enable MSI by default for SAS controllers
> > > |
> > > | This patch modifies the driver to enable MSI by default for all SAS chips.
> > > |
> > > cause kexec RHEL 5.1 kernel fail.
> > >
> > > root casue: the rhel 5.1 kernel still use INTx emulation.
> > > and mptscsih_shutdown doesn't call pci_disable_msi to reenable INTx on kexec path
> > >
> > > so try to call mptsas_remove in mptsas_shutdown.
> > > then pci_disable_msi will be called via mptsas_remove==>mptscih_remove==>
> > > mpt_detach.
> > >
> > > Signed-off-by: Yinghai Lu <yhlu.kernel@xxxxxxxxx>
> > > CC: Prakash, Sathya <sathya.prakash@xxxxxxx>
> > > CC: "Moore, Eric" <Eric.Moore@xxxxxxx>
> > >
> > > Index: linux-2.6/drivers/message/fusion/mptsas.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/message/fusion/mptsas.c
> > > +++ linux-2.6/drivers/message/fusion/mptsas.c
> > > @@ -3327,6 +3327,11 @@ static void __devexit mptsas_remove(stru
> > > mptscsih_remove(pdev);
> > > }
> > >
> > > +static void mptsas_shutdown(struct pci_dev *pdev)
> > > +{
> > > + mptsas_remove(pdev);
> > > +}
> > > +
> > > static struct pci_device_id mptsas_pci_table[] = {
> > > { PCI_VENDOR_ID_LSI_LOGIC, MPI_MANUFACTPAGE_DEVID_SAS1064,
> > > PCI_ANY_ID, PCI_ANY_ID },
> > > @@ -3348,7 +3353,7 @@ static struct pci_driver mptsas_driver =
> > > .id_table = mptsas_pci_table,
> > > .probe = mptsas_probe,
> > > .remove = __devexit_p(mptsas_remove),
> > > - .shutdown = mptscsih_shutdown,
> > > + .shutdown = mptsas_shutdown,
> > > #ifdef CONFIG_PM
> > > .suspend = mptscsih_suspend,
> > > .resume = mptscsih_resume,
> > > --
> >
> > fail on one system with big sas expander...
> >
> > LBSuse:~ # mkdir /xx
> > LBSuse:~ # mount /dev/sdl1 /xx
> > LBSuse:~ # cd /xx
> > LBSuse:/xx # sh kk_rh_5.1
> > LBSuse:/xx # ./kexec -e
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > IP: [<ffffffff80337af4>] sysfs_find_dirent+0x1f/0x5f
> > PGD 41f137067 PUD 424482067 PMD 0
> > Oops: 0000 [1] SMP
> > CPU 7
> > Modules linked in:
> > Pid: 7534, comm: kexec Not tainted
>
> I don't understand your email.
>
> Are you saying that this oops is the thing which your patch fixes?
>
> Or are you saying that this oops occurs even after your patch is applied?
> That we have a second regression?
>
>
>
> > 2.6.25-sched-devel.git-x86-latest.git-03823-g1508ed0-dirty #135
> > RIP: 0010:[<ffffffff80337af4>] [<ffffffff80337af4>] sysfs_find_dirent+0x1f/0x5f
> > RSP: 0018:ffff8104238f7708 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffffff80cd36a5 RCX: 000000008c0f362e
> > RDX: ffffffff80e20ab0 RSI: ffffffff80cd36a5 RDI: 0000000000000000
> > RBP: ffff8104238f7728 R08: 0000000000000000 R09: 000000008c0f362e
> > R10: ffff8104238f77f8 R11: 000000008c0f362e R12: 0000000000000000
> > R13: ffff810223190358 R14: 0000000000000000 R15: 0000000000000000
> > FS: 00007fde8859d6f0(0000) GS:ffff810427039f00(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000028 CR3: 00000004238b8000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process kexec (pid: 7534, threadinfo ffff8104238f6000, task ffff8104238e0000)
> > Stack: 0000000000000000 000000008c0f362e ffffffff80cd36a5 0000000000000000
> > ffff8104238f7758 ffffffff80337b70 000000008c0f362e 000000008c0f362e
> > ffff810223190268 ffffffff80e1fb20 ffff8104238f7798 ffffffff8033966e
> > Call Trace:
> > [<ffffffff80337b70>] sysfs_get_dirent+0x3c/0x72
> > [<ffffffff8033966e>] sysfs_remove_group+0x38/0xb4
> > [<ffffffff80633007>] dpm_sysfs_remove+0x2f/0x45
> > [<ffffffff806336d3>] device_pm_remove+0x34/0x85
> > [<ffffffff8062b283>] device_del+0x30/0x1b0
> > [<ffffffff8062b428>] device_unregister+0x25/0x48
> > [<ffffffff80639e73>] enclosure_unregister+0x85/0xcb
> > [<ffffffff807dad42>] ses_intf_remove+0x8b/0xa8
> > [<ffffffff8062b2fb>] device_del+0xa8/0x1b0
> > [<ffffffff8062b428>] device_unregister+0x25/0x48
> > [<ffffffff80700bb4>] __scsi_remove_device+0x4c/0xaf
> > [<ffffffff80700c50>] scsi_remove_device+0x39/0x5c
> > [<ffffffff80700d15>] __scsi_remove_target+0xa2/0xf6
> > [<ffffffff80700de0>] ? __remove_child+0x0/0x4f
> > [<ffffffff80700e12>] __remove_child+0x32/0x4f
> > [<ffffffff8062ab27>] ? next_device+0x21/0x45
> > [<ffffffff8062ac23>] device_for_each_child+0x40/0x84
> > [<ffffffff80713d8e>] ? do_sas_phy_delete+0x0/0x66
> > [<ffffffff80700dbc>] scsi_remove_target+0x53/0x77
> > [<ffffffff807134b0>] sas_rphy_remove+0x42/0x81
> > [<ffffffff80713514>] sas_rphy_delete+0x25/0x48
> > [<ffffffff80713570>] sas_port_delete+0x39/0x147
> > [<ffffffff802259e0>] ? mcount_call+0x5/0x35
> > [<ffffffff80713d8e>] ? do_sas_phy_delete+0x0/0x66
> > [<ffffffff80713dc2>] do_sas_phy_delete+0x34/0x66
> > [<ffffffff8062ac23>] device_for_each_child+0x40/0x84
> > [<ffffffff80713d8e>] ? do_sas_phy_delete+0x0/0x66
> > [<ffffffff8071343f>] sas_remove_children+0x2e/0x5d
> > [<ffffffff807134b7>] sas_rphy_remove+0x49/0x81
> > [<ffffffff80713514>] sas_rphy_delete+0x25/0x48
> > [<ffffffff80713570>] sas_port_delete+0x39/0x147
> > [<ffffffff802259e0>] ? mcount_call+0x5/0x35
> > [<ffffffff80713d8e>] ? do_sas_phy_delete+0x0/0x66
> > [<ffffffff80713dc2>] do_sas_phy_delete+0x34/0x66
> > [<ffffffff8062ac23>] device_for_each_child+0x40/0x84
> > [<ffffffff8071343f>] sas_remove_children+0x2e/0x5d
> > [<ffffffff807136a6>] sas_remove_host+0x28/0x3e
> > [<ffffffff80ab22ab>] mptsas_remove+0x46/0x107
> > [<ffffffff802259e0>] ? mcount_call+0x5/0x35
> > [<ffffffff8080ef6d>] mptsas_shutdown+0x21/0x37
> > [<ffffffff805a6815>] pci_device_shutdown+0x37/0x4d
> > [<ffffffff8062a2ad>] device_shutdown+0x64/0xa0
> > [<ffffffff8027e57f>] ? blocking_notifier_call_chain+0x27/0x3d
> > [<ffffffff8027131e>] kernel_restart_prepare+0x3f/0x5a
> > [<ffffffff802716f7>] sys_reboot+0x172/0x1cb
> > [<ffffffff802e2ac0>] ? __fput+0x158/0x17b
> > [<ffffffff802efc4e>] ? vfs_ioctl+0x3e/0xa2
> > [<ffffffff802e2ef0>] ? fput+0x2c/0x42
> > [<ffffffff802df2d2>] ? filp_close+0x78/0x9a
> > [<ffffffff802df0b4>] ? __put_unused_fd+0x33/0x60
> > [<ffffffff802e0bed>] ? sys_close+0x8c/0xdf
> > [<ffffffff80225b9b>] system_call_after_swapgs+0x7b/0x80
> >
> >
> > Code: e8 37 85 f2 ff 48 83 c4 18 5b c9 c3 55 48 89 e5 41 54 53 48 83
> > ec 10 66 66 90 66 90 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 <48>
> > 8b 5f 28 49 89 f4 eb 14 48 8b 7b 18 4c 89 e6 e8 25 e8 25 00
> > RIP [<ffffffff80337af4>] sysfs_find_dirent+0x1f/0x5f
> > RSP <ffff8104238f7708>
> > CR2: 0000000000000028
> > ---[ end trace 4ca22418d73866ec ]---
> >
> > may need create mptsas that only call pci_disable_msi
> >
>
> It would be strange for an interrupt-disabling problem to cause sysfs to go
> oops?

andrew, updated version has been merged...

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/