Re: [RFC PATCH] xhci: do not halt the secondary HCD

From: Joel Stanley
Date: Mon Sep 19 2016 - 04:24:38 EST


Hi Mathias,

On Mon, Sep 19, 2016 at 4:33 PM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Sep 19, 2016 at 04:05:45PM +0930, Joel Stanley wrote:
>> We can't halt the secondary HCD, because it's also the primary HCD,
>> which will cause problems if we have devices attached to the primary
>> HCD, like a keyboard.
>>
>> We've been carrying this in our Linux-as-a-bootloader environment for a little
>> while now. The machines all have the same TI TUSB73x0 part, and when we kexec
>> the devices don't come back until a system power cycle.
>>
>> I'd like some advice on an acceptable way to upstream the fix, so that the xhci
>> device survives kexec.
>
> Any reason you didn't cc: Mathias?

Fat fingers - I missed him when grabbing names from get_maintainers.
Thanks for adding him in.

On Mon, Sep 19, 2016 at 5:11 PM, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
> What kernel version is this?

This patch is against 4.4.21. I've tested newer releases but haven't
seen any improvement.

> As Greg said there are fixes in this area in the 4.8 latest rc kernel.
>
> If that doesn't work then we need to figure out what the real issue is.

No dice on 4.8-rc7 (without any patches).

Here's 4.8-rc7 loading:

[ 3.699524] xhci_hcd 0021:09:00.0: xHCI Host Controller
[ 3.699556] xhci_hcd 0021:09:00.0: new USB bus registered, assigned
bus number 1
[ 3.699640] xhci_hcd 0021:09:00.0: Using 64-bit DMA iommu bypass
[ 3.699697] xhci_hcd 0021:09:00.0: hcc params 0x0270f06d hci
version 0x96 quirks 0x00000000
[ 3.700286] hub 1-0:1.0: USB hub found
[ 3.700299] hub 1-0:1.0: 4 ports detected
[ 3.700493] xhci_hcd 0021:09:00.0: xHCI Host Controller
[ 3.700522] xhci_hcd 0021:09:00.0: new USB bus registered, assigned
bus number 2
[ 3.700552] usb usb2: We don't know the algorithms for LPM for this
host, disabling LPM.
[ 3.700733] hub 2-0:1.0: USB hub found
[ 3.700748] hub 2-0:1.0: 4 ports detected

Then we kexec into the second kernel. Here's what the second kernel
prints when trying to bring the controller up:

[ 1.588272] xhci_hcd 0021:09:00.0: xHCI Host Controller
[ 1.588282] xhci_hcd 0021:09:00.0: new USB bus registered, assigned
bus number 1
[ 1.619279] xhci_hcd 0021:09:00.0: Host not halted after 16000 microseconds.
[ 1.619281] xhci_hcd 0021:09:00.0: can't setup: -110
[ 1.619447] xhci_hcd 0021:09:00.0: USB bus 1 deregistered
[ 1.619457] xhci_hcd 0021:09:00.0: init 0021:09:00.0 fail, -110
[ 1.619571] xhci_hcd: probe of 0021:09:00.0 failed with error -110

Note that the second kernel is a distro one (Ubuntu 4.4.0-36-generic).

> xhci hardware is really just one controller. The split into primary and
> secondary HCD
> is a software only. We always load the primary HCD first (USB2) and
> secondary second (USB3).
> We unload them in reverse order, and need to stop the xhci (halt the hcd) as
> a first step.
>
> load primary
> load secondary (starts the xhci controller
> ...
> unload secondary (halts the controller)
> unload primary (free memory)

Thanks for the explanation. I wasn't the author of the first hack we
put in our tree, but I have rewritten it as we rebase on the stable
tree regularly.

So the hack as I sent it doesn't do any halt the secondary, and lets
the primary unload path halt the controller. Any theory as to why this
helps?

Cheers,

Joel