RE: [Linuxarm] Re: [PATCH for-next 00/32] spin lock usage optimization for SCSI drivers

From: Song Bao Hua (Barry Song)
Date: Tue Feb 23 2021 - 00:48:33 EST




> -----Original Message-----
> From: Finn Thain [mailto:fthain@xxxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, February 23, 2021 6:25 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: tanxiaofei <tanxiaofei@xxxxxxxxxx>; jejb@xxxxxxxxxxxxx;
> martin.petersen@xxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; linuxarm@xxxxxxxxxxxxx;
> linux-m68k@xxxxxxxxxxxxxxx
> Subject: RE: [Linuxarm] Re: [PATCH for-next 00/32] spin lock usage optimization
> for SCSI drivers
>
> On Mon, 22 Feb 2021, Song Bao Hua (Barry Song) wrote:
>
> > > On Thu, 18 Feb 2021, Xiaofei Tan wrote:
> > >
> > > > On 2021/2/9 13:06, Finn Thain wrote:
> > > > > On Tue, 9 Feb 2021, Song Bao Hua (Barry Song) wrote:
> > > > >
> > > > > > > On Sun, 7 Feb 2021, Xiaofei Tan wrote:
> > > > > > >
> > > > > > > > Replace spin_lock_irqsave with spin_lock in hard IRQ of SCSI
> > > > > > > > drivers. There are no function changes, but may speed up if
> > > > > > > > interrupt happen too often.
> > > > > > >
> > > > > > > This change doesn't necessarily work on platforms that support
> > > > > > > nested interrupts.
> > > > > > >
> > > > > > > Were you able to measure any benefit from this change on some
> > > > > > > other platform?
> > > > > >
> > > > > > I think the code disabling irq in hardIRQ is simply wrong. Since
> > > > > > this commit
> > > > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=e58aa3d2d0cc
> > > > > > genirq: Run irq handlers with interrupts disabled
> > > > > >
> > > > > > interrupt handlers are definitely running in a irq-disabled
> > > > > > context unless irq handlers enable them explicitly in the
> > > > > > handler to permit other interrupts.
> > > > > >
> > > > >
> > > > > Repeating the same claim does not somehow make it true. If you put
> > > > > your claim to the test, you'll see that that interrupts are not
> > > > > disabled on m68k when interrupt handlers execute.
> > > > >
> > > > > The Interrupt Priority Level (IPL) can prevent any given irq
> > > > > handler from being re-entered, but an irq with a higher priority
> > > > > level may be handled during execution of a lower priority irq
> > > > > handler.
> > > > >
> > > > > sonic_interrupt() uses an irq lock within an interrupt handler to
> > > > > avoid issues relating to this. This kind of locking may be needed
> > > > > in the drivers you are trying to patch. Or it might not.
> > > > > Apparently, no-one has looked.
> > > > >
> > > >
> > > > According to your discussion with Barry, it seems that m68k is a
> > > > little different from other architecture, and this kind of
> > > > modification of this patch cannot be applied to m68k. So, could help
> > > > to point out which driver belong to m68k architecture in this patch
> > > > set of SCSI? I can remove them.
> > > >
> > >
> > > If you would claim that "there are no function changes" in your
> > > patches (as above) then the onus is on you to support that claim.
> > >
> > > I assume that there are some platforms on which your assumptions hold.
> > >
> > > With regard to drivers for those platforms, you might want to explain
> > > why your patches should be applied there, given that the existing code
> > > is superior for being more portable.
> >
> > I don't think it has nothing to do with portability. In the case of
> > sonic_interrupt() you pointed out, on m68k, there is a high-priority
> > interrupt can preempt low-priority interrupt, they will result in access
> > the same critical data. M68K's spin_lock_irqsave() can disable the
> > high-priority interrupt and avoid the race condition of the data. So the
> > case should not be touched. I'd like to accept the reality and leave
> > sonic_interrupt() alone.
> >
> > However, even on m68k, spin_lock_irqsave is not needed for other
> > ordinary cases.
> > If there is no other irq handler coming to access same critical data,
> > it is pointless to hold a redundant irqsave lock in irqhandler even
> > on m68k.
> >
> > In thread contexts, we always need that if an irqhandler can preempt
> > those threads and access the same data. In hardirq, if there is an
> > high-priority which can jump out on m68k to access the critical data
> > which needs protection, we use the spin_lock_irqsave as you have used in
> > sonic_interrupt(). Otherwise, the irqsave is also redundant for m68k.
> >
> > >
> > > > BTW, sonic_interrupt() is from net driver natsemi, right? It would
> > > > be appreciative if only discuss SCSI drivers in this patch set.
> > > > thanks.
> > > >
> > >
> > > The 'net' subsystem does have some different requirements than the
> > > 'scsi' subsystem. But I don't see how that's relevant. Perhaps you can
> > > explain it. Thanks.
> >
> > The difference is that if there are two co-existing interrupts which can
> > access the same critical data on m68k. I don't think net and scsi
> > matter. What really matters is the specific driver.
> >
>
> Regarding m68k, your analysis overlooks the timing issue. E.g. patch 11/32
> could be a problem because removing the irqsave would allow PDMA transfers
> to be interrupted. Aside from the timing issues, I agree with your
> analysis above regarding m68k.

You mentioned you need realtime so you want an interrupt to be able to
preempt another one. Now you said you want an interrupt not to be preempted
as it will make a timing issue. If this PDMA transfer will have some problem
when it is preempted, I believe we need some enhanced ways to handle this,
otherwise, once we enable preempt_rt or threaded_irq, it will get the timing
issue. so here it needs a clear comment and IRQF_NO_THREAD if this is the
case.

>
> With regard to other architectures and platforms, in specific cases, e.g.
> where there's never more than one IRQ involved, then I could agree that
> your assumptions probably hold and an irqsave would be probably redundant.
>
> When you find a redundant irqsave, to actually patch it would bring a risk
> of regression with little or no reward. It's not my place to veto this
> entire patch series on that basis but IMO this kind of churn is misguided.

Nope.

I would say the real misguidance is that the code adds one lock while it
doesn't need the lock. Easily we can add redundant locks or exaggerate
the coverage range of locks, but the smarter way is that people add
locks only when they really need the lock by considering concurrency and
realtime performance.

Thanks
Barry