Re: [PATCH] xhci: use iopoll for xhci_handshake

From: Daniel Kurtz
Date: Thu Feb 28 2019 - 11:49:58 EST


On Thu, Feb 28, 2019 at 12:09 AM Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:
> > In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
> > a spin lock held (and local interrupts disabled) with a huge 5 second
> > timeout. This can translates to 5 million calls to udelay(1). By its
> > very nature, udelay() is not meant to be precise, it only guarantees to
> > delay a minimum of 1 microsecond. Therefore the actual delay of
> > xhci_handshake() can be significantly longer. If the average udelay(1)
> > is greater than 2.2 us, the total time in xhci_handshake() - with
> > interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
> > detector.
> >
> > To avoid this, let's replace the open coded io polling loop with one from
> > iopoll.h that uses a loop timed with the more presumably reliable ktime
> > infrastructure.
> >
> > Signed-off-by: Daniel Kurtz <djkurtz@xxxxxxxxxxxx>
>
> Looks sane to me, nice fixup.
>
> Reviewed-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>
> Is this causing problems on older kernels/devices today such that we
> should backport this?

We detected that xhci_handshake timing out can lead to softlockup
while debugging a USB issue on a new product. The xhci_handshake
timeout itself is a symptom of another underlying problem causing some
commands to be aborted. I don't know if any such underlying problems
exist on other older devices, but the potential is there so a backport
is reasonable. Although, it may just shift the symptom of an
underlying problem from a softlockup/oops to some other symptom, like
USB just being dead.

-Dan

>
> thanks,
>
> greg k-h