Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

From: Nuno Sá
Date: Fri Feb 23 2024 - 05:33:08 EST


On Fri, 2024-02-23 at 10:45 +0100, Herve Codina wrote:
> Hi Saravana, Nuno,
>
> On Tue, 20 Feb 2024 16:37:05 -0800
> Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> ...
> > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > >                 goto out;
> > >         }
> > >
> > > +       /*
> > > +        * Wait for any ongoing device link removals before removing some
> > > of
> > > +        * nodes
> > > +        */
> > > +       device_link_wait_removal();
> > > + 
> >
> > Nuno in his patch[1] had this "wait" happen inside
> > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > the issue that Luca reported[2] in this patch series. Is there any
> > problem with doing that?
>
> Is it the right place to wait ?
>
> __of_changeset_entry_destroy() can do some of_node_put() and I am not sure
> that of_node_put() can call device_put() when the of_node refcount reachs
> zero.
>

I don't think of_node_put() can call device_put(). At least by looking at:

https://elixir.bootlin.com/linux/v6.8-rc5/source/drivers/of/dynamic.c#L326

> If of_node_put() cannot call device_put(), I think we can wait in the
> of_changeset_destroy(). I.e. the __of_changeset_entry_destroy() caller.
>   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/dynamic.c#L670
>
> What do you think about this ?
> Does it make sense ?

I think it makes sense from a logical point of view. Like, let's flush the queue
right before checking our assumptions...

In my tests, I did not saw any issue (Hopefully I was not missing any subtlety).

- Nuno Sá