Re: [PATCH] xhci: fix null pointer deref for xhci_urb_enqueue

From: Kuen-Han Tsai
Date: Sat Nov 18 2023 - 06:26:00 EST


Hi Greg

On Fri, Nov 17, 2023 at 9:53 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 17, 2023 at 03:21:28PM +0800, Kuen-Han Tsai wrote:
> > The null pointer dereference happens when xhci_free_dev() frees the
> > xhci->devs[slot_id] virtual device while xhci_urb_enqueue() is
> > processing a urb and checking the max packet size.
> >
> > [106913.850735][ T2068] usb 2-1: USB disconnect, device number 2
> > [106913.856999][ T4618] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > [106913.857488][ T4618] Call trace:
> > [106913.857491][ T4618] xhci_check_maxpacket+0x30/0x2dc
> > [106913.857494][ T4618] xhci_urb_enqueue+0x24c/0x47c
> > [106913.857498][ T4618] usb_hcd_submit_urb+0x1f4/0xf34
> > [106913.857501][ T4618] usb_submit_urb+0x4b8/0x4fc
> > [106913.857503][ T4618] usb_control_msg+0x144/0x238
> > [106913.857507][ T4618] do_proc_control+0x1f0/0x5bc
> > [106913.857509][ T4618] usbdev_ioctl+0xdd8/0x15a8
> >
> > This patch adds a spinlock to the xhci_urb_enqueue function to make sure
> > xhci_free_dev() and xhci_urb_enqueue() do not race and cause null
> > pointer dereference.
>
> I thought we had a lock for this already, what changed to cause this to
> start triggering now, all these years later?

Right, there is a lock in place for xhci_urb_enqueue(), but it doesn't
protect all code segments that use xhci->devs[slot_id] within the
function. I couldn't identify any specific changes that might have
introduced this issue. It's likely a long-standing potential problem
that's difficult to trigger under normal situations.

This issue happens when the USB enumeration process is complete, and a
user space program submits a control request to the peripheral, but
then the device is rapidly disconnected. I was able to reproduce this
issue by introducing a 3-second delay within xhci_check_maxpacket()
and disconnecting the peripheral while observing that the control
request is being processed by xhci_check_maxpacket().

>
> >
> > Signed-off-by: Kuen-Han Tsai <khtsai@xxxxxxxxxx>
>
> What commit id does this fix?

Should I include a "Fixes:" header even if this patch doesn't address
a bug from a specific commit?

>
>
> > ---
> > drivers/usb/host/xhci.c | 38 ++++++++++++++++++++++++--------------
> > 1 file changed, 24 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > index 884b0898d9c9..e0766ebeff0e 100644
> > --- a/drivers/usb/host/xhci.c
> > +++ b/drivers/usb/host/xhci.c
> > @@ -1522,23 +1522,32 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
> > struct urb_priv *urb_priv;
> > int num_tds;
> >
> > - if (!urb)
> > - return -EINVAL;
> > - ret = xhci_check_args(hcd, urb->dev, urb->ep,
> > - true, true, __func__);
> > - if (ret <= 0)
> > - return ret ? ret : -EINVAL;
> > + spin_lock_irqsave(&xhci->lock, flags);
> > +
> > + if (!urb) {
> > + ret = -EINVAL;
> > + goto done;
> > + }
>
> Why does this have to be inside the lock? The urb can't change here,
> can it?

You're right, no need to place those inside the lock. I will move them
out of the protection.

>
> > +
> > + ret = xhci_check_args(hcd, urb->dev, urb->ep, true, true, __func__);
> > + if (ret <= 0) {
> > + ret = ret ? ret : -EINVAL;
> > + goto done;
> > + }
> >
> > slot_id = urb->dev->slot_id;
> > ep_index = xhci_get_endpoint_index(&urb->ep->desc);
> > ep_state = &xhci->devs[slot_id]->eps[ep_index].ep_state;
> >
> > - if (!HCD_HW_ACCESSIBLE(hcd))
> > - return -ESHUTDOWN;
> > + if (!HCD_HW_ACCESSIBLE(hcd)) {
> > + ret = -ESHUTDOWN;
> > + goto done;
>
> Note, we now have completions, so all of this "goto done" doesn't need
> to happen anymore. Not a complaint, just a suggestion for future
> changes or this one, your choice.
>

I'm not familiar with the concept of 'completions'. Can you please
provide some links or explanations to help me understand it? I use a
'goto done' statement because I follow this pattern seen in many
previous commits. However, I'm willing to modify this approach if
there's a more suitable alternative.

Please forgive me if any of my questions seem overly basic. I'm still
in the process of learning how to contribute to the kernel community.

Thanks,
Kuen-Han

> thanks,
>
> greg k-h