Re: [PATCH v2] usb/hcd: Send a uevent signaling that the host controller has died

From: Greg Kroah-Hartman
Date: Tue Apr 16 2019 - 06:38:36 EST


On Thu, Apr 11, 2019 at 12:52:11PM -0600, Raul E Rangel wrote:
> This change will send a CHANGE event to udev with the DEAD environment
> variable set when the HC dies. I chose this instead of any of the other
> udev events because it's representing a state change in the host
> controller. The only other event that might have fit was OFFLINE, but
> that seems to be used for hot-removal and it implies the device could
> come ONLINE again.

Is "DEAD" used by any other uevents?

> By notifying user space the appropriate policies can be applied.
> i.e.,
> * Collect error logs.
> * Notify the user that USB is no longer functional.
> * Perform a graceful reboot.

What userspace code uses this new uevent to do any of this?

I think "OFFLINE" is a bit better here, it does not always imply that it
can come online again.

> Signed-off-by: Raul E Rangel <rrangel@xxxxxxxxxxxx>
> ---
> I wasn't able to find any good examples of other drivers sending a dead
> notification.
>
> Use an EVENT= format
> https://github.com/torvalds/linux/blob/master/drivers/acpi/dock.c#L302
> https://github.com/torvalds/linux/blob/master/drivers/net/wireless/ath/wil6210/interrupt.c#L497
>
> Uses SDEV_MEDIA_CHANGE=
> https://github.com/torvalds/linux/blob/master/drivers/scsi/scsi_lib.c#L2318
>
> Uses ERROR=1.
> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7f6d8aec5803aac44192f03dce5637b66cda7abf/drivers/input/touchscreen/atmel_mxt_ts.c#1581
> I'm not a fan because it doesn't signal what the error was.
>
> We could change the DEAD=1 event to maybe ERROR=1?

"ERROR=1" is worse than "DEAD=1" :(

> Also where would be a good place to document this?

Documentation/ABI/ is a good start.

> Also thanks for the review. This is my first kernel patch so forgive me
> if I get the workflow wrong.
>
> Changes in v2:
> - Check that the root hub still exists before sending the uevent.
> - Ensure died_work has completed before deallocating.
>
> drivers/usb/core/hcd.c | 32 ++++++++++++++++++++++++++++++++
> include/linux/usb/hcd.h | 1 +
> 2 files changed, 33 insertions(+)
>
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index 975d7c1288e3..7835f1a3647d 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -2343,6 +2343,27 @@ int hcd_bus_resume(struct usb_device *rhdev, pm_message_t msg)
> return status;
> }
>
> +
> +/**
> + * hcd_died_work - Workqueue routine for root-hub has died.
> + * @hcd: primary host controller for this root hub.
> + *
> + * Do not call with the shared_hcd.
> + * */

No need for kerneldoc fortting for a static function.

And your documentation isn't even correct, @hcd is not an argument to
this function :(

> +static void hcd_died_work(struct work_struct *work)
> +{
> + struct usb_hcd *hcd = container_of(work, struct usb_hcd, died_work);
> +
> + mutex_lock(&usb_bus_idr_lock);

Why do you need to lock something that is "dead"? And why is the idr
lock the correct one here?

> +
> + if (hcd->self.root_hub)
> + /* Notify user space that the host controller has died */
> + kobject_uevent_env(&hcd->self.root_hub->dev.kobj, KOBJ_CHANGE,
> + (char *[]){ "DEAD=1", NULL });

declaring the envp in the function is cute, but please don't do that,
make it obvious what is happening here with a real variable.

> +
> + mutex_unlock(&usb_bus_idr_lock);
> +}
> +
> /* Workqueue routine for root-hub remote wakeup */
> static void hcd_resume_work(struct work_struct *work)
> {
> @@ -2488,6 +2509,13 @@ void usb_hc_died (struct usb_hcd *hcd)
> usb_kick_hub_wq(hcd->self.root_hub);
> }
> }
> +
> + /* Handle the case where this function gets called with a shared HCD */
> + if (usb_hcd_is_primary_hcd(hcd))
> + schedule_work(&hcd->died_work);
> + else
> + schedule_work(&hcd->primary_hcd->died_work);
> +
> spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> /* Make sure that the other roothub is also deallocated. */
> }
> @@ -2555,6 +2583,8 @@ struct usb_hcd *__usb_create_hcd(const struct hc_driver *driver,
> INIT_WORK(&hcd->wakeup_work, hcd_resume_work);
> #endif
>
> + INIT_WORK(&hcd->died_work, hcd_died_work);
> +
> hcd->driver = driver;
> hcd->speed = driver->flags & HCD_MASK;
> hcd->product_desc = (driver->product_desc) ? driver->product_desc :
> @@ -2908,6 +2938,7 @@ int usb_add_hcd(struct usb_hcd *hcd,
> #ifdef CONFIG_PM
> cancel_work_sync(&hcd->wakeup_work);
> #endif
> + cancel_work_sync(&hcd->died_work);
> mutex_lock(&usb_bus_idr_lock);
> usb_disconnect(&rhdev); /* Sets rhdev to NULL */
> mutex_unlock(&usb_bus_idr_lock);
> @@ -2968,6 +2999,7 @@ void usb_remove_hcd(struct usb_hcd *hcd)
> #ifdef CONFIG_PM
> cancel_work_sync(&hcd->wakeup_work);
> #endif
> + cancel_work_sync(&hcd->died_work);
>
> mutex_lock(&usb_bus_idr_lock);
> usb_disconnect(&rhdev); /* Sets rhdev to NULL */
> diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
> index 695931b03684..ae51d5bd1dfc 100644
> --- a/include/linux/usb/hcd.h
> +++ b/include/linux/usb/hcd.h
> @@ -98,6 +98,7 @@ struct usb_hcd {
> #ifdef CONFIG_PM
> struct work_struct wakeup_work; /* for remote wakeup */
> #endif
> + struct work_struct died_work; /* for dying */

"For when the device dies"?

And have you ever hit this in the real world? If so, what can you do
about it?

thanks,

greg k-h