Re: [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

From: Rob Herring
Date: Mon Mar 04 2024 - 10:22:15 EST


On Thu, Feb 29, 2024 at 12:18:49PM +0100, Nuno Sá wrote:
> On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote:
> > In the following sequence:
> >   1) of_platform_depopulate()
> >   2) of_overlay_remove()
> >
> > During the step 1, devices are destroyed and devlinks are removed.
> > During the step 2, OF nodes are destroyed but
> > __of_changeset_entry_destroy() can raise warnings related to missing
> > of_node_put():
> >   ERROR: memory leak, expected refcount 1 instead of 2 ...
> >
> > Indeed, during the devlink removals performed at step 1, the removal
> > itself releasing the device (and the attached of_node) is done by a job
> > queued in a workqueue and so, it is done asynchronously with respect to
> > function calls.
> > When the warning is present, of_node_put() will be called but wrongly
> > too late from the workqueue job.
> >
> > In order to be sure that any ongoing devlink removals are done before
> > the of_node destruction, synchronize the of_overlay_remove() with the
> > devlink removals.
> >
> > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Herve Codina <herve.codina@xxxxxxxxxxx>
> > ---
> >  drivers/of/overlay.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > index 2ae7e9d24a64..7a010a62b9d8 100644
> > --- a/drivers/of/overlay.c
> > +++ b/drivers/of/overlay.c
> > @@ -8,6 +8,7 @@
> >  
> >  #define pr_fmt(fmt) "OF: overlay: " fmt
> >  
> > +#include <linux/device.h>
>
> This is clearly up to the DT maintainers to decide but, IMHO, I would very much
> prefer to see fwnode.h included in here rather than directly device.h (so yeah,
> renaming the function to fwnode_*).

IMO, the DT code should know almost nothing about fwnode because that's
the layer above it. But then overlay stuff is kind of a layer above the
core DT code too.

> But yeah, I might be biased by own series :)
>
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> >  #include <linux/of.h>
> > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct
> > overlay_changeset *ovcs)
> >  {
> >   int i;
> >  
> > + /*
> > + * Wait for any ongoing device link removals before removing some of
> > + * nodes. Drop the global lock while waiting
> > + */
> > + mutex_unlock(&of_mutex);
> > + device_link_wait_removal();
> > + mutex_lock(&of_mutex);
>
> I'm still not convinced we need to drop the lock. What happens if someone else
> grabs the lock while we are in device_link_wait_removal()? Can we guarantee that
> we can't screw things badly?

It is also just ugly because it's the callers of
free_overlay_changeset() that hold the lock and now we're releasing it
behind their back.

As device_link_wait_removal() is called before we touch anything, can't
it be called before we take the lock? And do we need to call it if
applying the overlay fails?

Rob