Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage

From: James Bottomley
Date: Fri Sep 11 2015 - 13:51:17 EST


On Thu, 2015-09-10 at 23:55 -0700, Nicholas A. Bellinger wrote:
> On Wed, 2015-09-09 at 15:03 -0700, Nicholas A. Bellinger wrote:
> > On Wed, 2015-09-09 at 19:59 +0530, Chaitra Basappa wrote:
> > > From: Sreekanth Reddy [mailto:sreekanth.reddy@xxxxxxxxxxxxx]
> > > Sent: Tuesday, September 08, 2015 5:26 PM
> > > To: Nicholas A. Bellinger
> > > Cc: linux-scsi; linux-kernel; James Bottomley; Calvin Owens; Christoph
> > > Hellwig; MPT-FusionLinux.pdl; kernel-team; Nicholas Bellinger; Chaitra
> > > Basappa
> > > Subject: Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix
> > > unsafe list usage
> > >
> > > On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxx>
> > > wrote:
> > > > From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> > > >
> > > > These objects can be referenced concurrently throughout the driver, we
> > > > need a way to make sure threads can't delete them out from under each
> > > > other. This patch adds the refcount, and refactors the code to use it.
> > > >
> > > > Additionally, we cannot iterate over the sas_device_list without
> > > > holding the lock, or we risk corrupting random memory if items are
> > > > added or deleted as we iterate. This patch refactors
> > > > _scsih_probe_sas() to use the sas_device_list in a safe way.
> > > >
> > > > This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> > > > mpt3sas changes in scsi.git/for-next.
> > > >
> > > > Cc: Calvin Owens <calvinowens@xxxxxx>
> > > > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > > > Cc: Sreekanth Reddy <sreekanth.reddy@xxxxxxxxxxxxx>
> > > > Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@xxxxxxxxxxxxx>
> > > > Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> > > > ---
> > > > drivers/scsi/mpt3sas/mpt3sas_base.h | 25 +-
> > > > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 479
> > > > +++++++++++++++++++++----------
> > > > drivers/scsi/mpt3sas/mpt3sas_transport.c | 18 +-
> > > > 3 files changed, 364 insertions(+), 158 deletions(-)
> > > >
> > > > @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> > > > u16 handle)
> > > > struct scsi_device *sdev;
> > > > struct _sas_device *sas_device;
> > > >
> > >
> > > [Sreekanth] Here sas_device_lock spin lock needs to be acquired before
> > > calling
> > > __mpt3sas_get_sdev_by_addr() function.
> > >
> > > [Chaitra]Here instead of calling " __mpt3sas_get_sdev_by_handle()" function
> > > calling
> > > "mpt3sas_get_sdev_by_handle()" function will fixes "invalid page access"
> > > type of kernel panic
> > >
> > > > - sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > > > + sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> > > > if (!sas_device)
> > > > return;
> > > >
> >
> > Whoops, missed this comment in _scsih_block_io_device() from Sreekanth's
> > earlier reply.
> >
> > Here's the updated incremental patch atop target-pending/for-next-merge
> > to use the protected callers for both cases.
> >
> > Please review + ACK ASAP.
>
> The mpt3sas -v2 series + v4.3-rc0 breakage incremental patch here made
> it into linux-next-09102015, and at this point I don't see a scenario
> where keeping around the broken list_head dereferences makes sense.

I already explained the dangers of what the patch does. Separated
lifetime objects need to be treated very carefully. Rushing this in to
-rc1 without an Avago soak test is irresponsible. Two issues have
already turned up in this thanks to inspection and as a bug fix it's not
bound by the merge window anyway so there's no reason to rush it into
-rc1 without the proper testing.

The reason for wanting to do this right is not to create a bisection
black hole: if we create an unreliable base storage driver by rushing
this into -rc1 it makes bisection very difficult for people who use mpt3
gear because they won't know if it's the bug they're chasing or the one
we introduced which they can't avoid because they have to use a storage
driver to boot the kernel.


> So that said, I'd like to send a target-pending/for-next-merge PULL
> request out to Linus in the next 48 hours.

How about no: it's not a target patch, it's an initiator patch, which
makes it my decision not yours. The Maintainers are being responsive,
so there's no reason to override their request for a soak test, even if
you are the patch author. It will get pushed once they confirm.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/