Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

From: Nuno Sá
Date: Fri Feb 23 2024 - 03:53:54 EST


On Fri, 2024-02-23 at 09:46 +0100, Herve Codina wrote:
> Hi,
>
> On Thu, 22 Feb 2024 17:08:28 -0800
> Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> > On Tue, Feb 20, 2024 at 10:56 PM Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > >
> > > On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote: 
> > > > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <herve.codina@xxxxxxxxxxx>
> > > > wrote: 
> > > > >
> > > > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > > > introduces a workqueue to release the consumer and supplier devices
> > > > > used
> > > > > in the devlink.
> > > > > In the job queued, devices are release and in turn, when all the
> > > > > references to these devices are dropped, the release function of the
> > > > > device itself is called.
> > > > >
> > > > > Nothing is present to provide some synchronisation with this workqueue
> > > > > in order to ensure that all ongoing releasing operations are done and
> > > > > so, some other operations can be started safely.
> > > > >
> > > > > For instance, in the following sequence:
> > > > >   1) of_platform_depopulate()
> > > > >   2) of_overlay_remove()
> > > > >
> > > > > During the step 1, devices are released and related devlinks are
> > > > > removed
> > > > > (jobs pushed in the workqueue).
> > > > > During the step 2, OF nodes are destroyed but, without any
> > > > > synchronisation with devlink removal jobs, of_overlay_remove() can
> > > > > raise
> > > > > warnings related to missing of_node_put():
> > > > >   ERROR: memory leak, expected refcount 1 instead of 2
> > > > >
> > > > > Indeed, the missing of_node_put() call is going to be done, too late,
> > > > > from the workqueue job execution.
> > > > >
> > > > > Introduce device_link_wait_removal() to offer a way to synchronize
> > > > > operations waiting for the end of devlink removals (i.e. end of
> > > > > workqueue jobs).
> > > > > Also, as a flushing operation is done on the workqueue, the workqueue
> > > > > used is moved from a system-wide workqueue to a local one. 
> > > >
> > > > Thanks for the bug report and fix. Sorry again about the delay in
> > > > reviewing the changes.
> > > >
> > > > Please add Fixes tag for 80dd33cf72d1.
> > > >  
> > > > > Signed-off-by: Herve Codina <herve.codina@xxxxxxxxxxx>
> > > > > ---
> > > > >  drivers/base/core.c    | 26 +++++++++++++++++++++++---
> > > > >  include/linux/device.h |  1 +
> > > > >  2 files changed, 24 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > > index ac026187ac6a..2e102a77758c 100644
> > > > > --- a/drivers/base/core.c
> > > > > +++ b/drivers/base/core.c
> > > > > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> > > > >  static void __fw_devlink_link_to_consumers(struct device *dev);
> > > > >  static bool fw_devlink_drv_reg_done;
> > > > >  static bool fw_devlink_best_effort;
> > > > > +static struct workqueue_struct *fw_devlink_wq;
> > > > >
> > > > >  /**
> > > > >   * __fwnode_link_add - Create a link between two fwnode_handles.
> > > > > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device
> > > > > *dev)
> > > > >         /*
> > > > >          * It may take a while to complete this work because of the
> > > > > SRCU
> > > > >          * synchronization in device_link_release_fn() and if the
> > > > > consumer or
> > > > > -        * supplier devices get deleted when it runs, so put it into
> > > > > the "long"
> > > > > -        * workqueue.
> > > > > +        * supplier devices get deleted when it runs, so put it into
> > > > > the
> > > > > +        * dedicated workqueue.
> > > > >          */
> > > > > -       queue_work(system_long_wq, &link->rm_work);
> > > > > +       queue_work(fw_devlink_wq, &link->rm_work); 
> > > >
> > > > This has nothing to do with fw_devlink. fw_devlink is just triggering
> > > > the issue in device links. You can hit this bug without fw_devlink too.
> > > > So call this device_link_wq since it's consistent with device_link_*
> > > > APIs.
> > > >  
> > >
> > > I'm not sure if I got this right in my series. I do call
> > > devlink_release_queue() to
> > > my queue. But on the Overlay side I use fwnode_links_flush_queue() because
> > > it looked
> > > more sensible from an OF point of view. And including (in OF code)
> > > linux/fwnode.h
> > > instead linux/device.h makes more sense to me.
> > >  
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * device_link_wait_removal - Wait for ongoing devlink removal jobs
> > > > > to terminate
> > > > > + */
> > > > > +void device_link_wait_removal(void)
> > > > > +{
> > > > > +       /*
> > > > > +        * devlink removal jobs are queued in the dedicated work
> > > > > queue.
> > > > > +        * To be sure that all removal jobs are terminated, ensure
> > > > > that any
> > > > > +        * scheduled work has run to completion.
> > > > > +        */
> > > > > +       drain_workqueue(fw_devlink_wq); 
> > > >
> > > > Is there a reason this needs to be drain_workqueu() instead of
> > > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > > case. All we are trying to make sure is that all the device link
> > > > remove work queued so far have completed.
> > > >  
> > >
> > > Yeah, I'm also using flush_workqueue().
> > >  
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> > > > > +
> > > > >  static struct class devlink_class = {
> > > > >         .name = "devlink",
> > > > >         .dev_groups = devlink_groups,
> > > > > @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> > > > >         sysfs_dev_char_kobj = kobject_create_and_add("char",
> > > > > dev_kobj);
> > > > >         if (!sysfs_dev_char_kobj)
> > > > >                 goto char_kobj_err;
> > > > > +       fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> > > > > +       if (!fw_devlink_wq) 
> > > >
> > > > Fix the name appropriately here too please. 
> > >
> > > Hi Saravana,
> > >
> > > Oh, was not aware of this series... Please look at my first patch. It
> > > already has a
> > > review tag by Rafael. I think the creation of the queue makes more sense
> > > to be done
> > > in devlink_class_init(). Moreover, Rafael complained in my first version
> > > that
> > > erroring out because we failed to create the queue is too harsh since
> > > devlinks can
> > > still work. 
> >
> > I think Rafael can be convinced on this one. Firstly, if we fail to
> > allocate so early, we have bigger problems.
> >
> > > So, what we do is to schedule the work if we have a queue or too call
> > > device_link_release_fn() synchronously if we don't have the queue (note
> > > that failing
> > > to allocate the queue is very unlikely anyways). 
> >
> > device links don't really work when you synchronously need to delete a
> > link since it always uses SRCUs (it used to have a #ifndef CONFIG_SRCU
> > locking). That's like saying a code still works when it doesn't hit a
> > deadlock condition.
> >
> > Let's stick with Herve's patch series since he send it first and it
> > has fewer things that need to be fixed. If he ignores this thread for
> > too long, you can send a revision of yours again and we can accept
> > that.
>
> I don't ignore the thread :)
>
> Hope I could take some time in the near future to send a v2 of this
> series.

Hi Herve,

Just let me know if you don't see that happening anytime soon :). I'm very
interested in having this applied fairly soon and I think the base idea for the
fix is more or less in place (for both series). So it should be minor details
now :).

- Nuno Sá