Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

From: Nuno Sá
Date: Fri Feb 23 2024 - 05:46:15 EST


On Fri, 2024-02-23 at 10:11 +0100, Herve Codina wrote:
> Hi Saravana,
>
> On Tue, 20 Feb 2024 16:31:13 -0800
> Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> ...
>
> > > +void device_link_wait_removal(void)
> > > +{
> > > +       /*
> > > +        * devlink removal jobs are queued in the dedicated work queue.
> > > +        * To be sure that all removal jobs are terminated, ensure that
> > > any
> > > +        * scheduled work has run to completion.
> > > +        */
> > > +       drain_workqueue(fw_devlink_wq); 
> >
> > Is there a reason this needs to be drain_workqueu() instead of
> > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > case. All we are trying to make sure is that all the device link
> > remove work queued so far have completed.
>
> I used drain_workqueue() because drain_workqueue() allows for jobs already
> present in a workqueue to re-queue a job and drain_workqueue() will wait
> also for this new job completion.
>
> I think flush_workqueue() doesn't wait for this chain queueing.
>
> In our case, my understanding was that device_link_release_fn() calls
> put_device() for the consumer and the supplier.
> If refcounts reaches zero, devlink_dev_release() can be called again
> and re-queue a job.
>

Looks sensible. The only doubt (that Saravana mays know better) is that I'm not
sure put_device() on a supplier or consumer can actually lead to
devlink_dev_release(). AFAIU, a consumer or a supplier should not be a device
from the devlink class. Hence, looking at device_release(), I'm not sure it can
happen unless for some odd reason someone is messing with devlinks in .remove()
or .type->remove().

- Nuno Sá