xhci_reset_endpoint() doesn't reset endpoint

From: Michal Necasek
Date: Wed Dec 14 2016 - 05:59:01 EST



Hi Mathias,

We have run into a problem with a USB printer which we're quite confident is a bug in the Linux xHCI driver. There is no problem when the same printer is plugged into a port managed by the EHCI driver.

The core problem is that xhci_reset_endpoint() doesn't do anything, and more specifically does not reset the xHC's data toggle/sequence number. That is not normally an issue, because the reset does happen in response to a STALL; in our scenario, there is no STALL or any other error. That can lead to the data toggle getting out of sync and the host dropping a packet sent by the device.

Now a detailed problem description. We have a USB printer passed through to a VM. The VM runs Windows 8.1 or 10 (other versions may be affected too), and uses Microsoft's standard usbprint.sys to talk to the printer. The vendor printer driver tries to query the printer's configuration, using the control endpoint, one OUT endpoint, and one IN endpoint. The query always times out/fails when printer is plugged into a port managed by xHCI, yet works in EHCI ports.

The usbprint.sys driver is a bit funny and in many cases (though not always) queues up URBs on the IN endpoint in advance, and once it decides that it has received the entire response, cancels the last URB and resets the IN endpoint (issuing SetFeature(CLEAR_HALT)). After much head scratching, we realized, and later confirmed with a USB analyzer, that the next IN packet that the printer sends is not seen by the host's USB stack at all, let alone the guest OS. Other packets arrive just fine, but the guest OS keeps waiting for more data to arrive, eventually loses patience and fails.

We cannot observe the data toggle state of the xHC but we are fairly certain that things go wrong when the data toggle is set (on both ends) prior to the endpoint reset. SetFeature(CLEAR_HALT) resets the toggle on the device, but not on the host. But we know for a fact that the device sends a packet (with data toggle 0) which the host USB stack never sees, and a data toggle mismatch explains that quite well.

We are using USBFS to talk to the printer, but that shouldn't matter much. I will note that the available documentation<1> explicitly says that USBDEVFS_RESETEP and USBDEVFS_CLEAR_HALT both reset the data toggle. That is indeed the case for the Linux EHCI driver but not xHCI. Both of the USBFS IOCTLs call into xhci_reset_endpoint() which does nothing.

We believe that xhci_reset_endpoint() needs to reset the data toggle/sequence number to match the documentation and for compatibility with the EHCI driver. We tried but failed to find a workaround which would reset the data toggle without side effects (e.g. USBDEVFS_SETINTERFACE does reset the toggle on the IN endpoint, but also resets it on the OUT endpoint and talks to the device, so that's no good).

The data toggle management is not terribly well documented in the xHCI spec so we hope you know about it more than we do. Based on our understanding of the xHCI specification, xhci_reset_endpoint() should issue either a Reset Endpoint command with TSP=0 or a dummy Configure Endpoint command dropping/re-adding the specified endpoint (as the xHCI 1.1 spec suggests at the end of 4.6.8). Please confirm if that should solve the problem.

We don't know how many devices this problem affects. We suspect it affects many USB printers and could in theory affect more or less any device, but few drivers reset endpoints when there are no errors. The problem scenario can probably be artificially reproduced with more or less any USB device (when data toggle is set, issue USBDEVFS_CLEAR_HALT, see if next packet arrives at destination).


Regards,
Michal


1:
https://www.kernel.org/doc/htmldocs/usb/usbfs-ioctl.html