Re: [PATCH RESEND RFC] driver: core: don't queue device links removal for dt overlays

From: Nuno Sá
Date: Wed Jan 31 2024 - 10:46:29 EST


On Wed, 2024-01-31 at 16:10 +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 31, 2024 at 3:52 PM Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> >
> > On Wed, 2024-01-31 at 15:28 +0100, Rafael J. Wysocki wrote:
> > > On Wed, Jan 31, 2024 at 3:18 PM Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, 2024-01-31 at 14:30 +0100, Rafael J. Wysocki wrote:
> > > > > On Wed, Jan 31, 2024 at 1:20 PM Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, 2024-01-23 at 16:40 +0100, Nuno Sa via B4 Relay wrote:
> > > > > > > From: Nuno Sa <nuno.sa@xxxxxxxxxx>
> > > > > > >
> > > > > > > For device links, releasing the supplier/consumer devices
> > > > > > > references
> > > > > > > happens asynchronously in device_link_release_fn(). Hence, the
> > > > > > > possible
> > > > > > > release of an of_node is also asynchronous. If these nodes were
> > > > > > > added
> > > > > > > through overlays we have a problem because this does not respect
> > > > > > > the
> > > > > > > devicetree overlays assumptions that when a changeset is
> > > > > > > being removed in __of_changeset_entry_destroy(), it must hold the
> > > > > > > last
> > > > > > > reference to that node. Due to the async nature of device links
> > > > > > > that
> > > > > > > cannot be guaranteed.
> > > > > > >
> > > > > > > Given the above, in case one of the link consumer/supplier is part
> > > > > > > of
> > > > > > > an overlay node we call directly device_link_release_fn() instead
> > > > > > > of
> > > > > > > queueing it. Yes, it might take some significant time for
> > > > > > > device_link_release_fn() to complete because of synchronize_srcu()
> > > > > > > but
> > > > > > > we would need to, anyways, wait for all OF references to be
> > > > > > > released
> > > > > > > if
> > > > > > > we want to respect overlays assumptions.
> > > > > > >
> > > > > > > Signed-off-by: Nuno Sa <nuno.sa@xxxxxxxxxx>
> > > > > > > ---
> > > > > > > This RFC is a follow up of a previous one that I sent to the
> > > > > > > devicetree
> > > > > > > folks [1]. It got rejected because it was not really fixing the
> > > > > > > root
> > > > > > > cause of the issue (which I do agree). Please see the link where I
> > > > > > > fully explain what the issue is.
> > > > > > >
> > > > > > > I did also some git blaming and did saw that commit
> > > > > > > 80dd33cf72d1 ("drivers: base: Fix device link removal") introduced
> > > > > > > queue_work() as we could be releasing the last device reference
> > > > > > > and
> > > > > > > hence
> > > > > > > sleeping which is against SRCU callback requirements. However,
> > > > > > > that
> > > > > > > same
> > > > > > > commit is now making use of synchronize_srcu() which may take
> > > > > > > significant time (and I think that's the reason for the work
> > > > > > > item?).
> > > > > > >
> > > > > > > However, given the dt overlays requirements, I'm not seeing any
> > > > > > > reason to not be able to run device_link_release_fn()
> > > > > > > synchronously if
> > > > > > > we
> > > > > > > detect an OVERLAY node is being released. I mean, even if we come
> > > > > > > up
> > > > > > > (and I did some experiments in this regard) with some async
> > > > > > > mechanism
> > > > > > > to
> > > > > > > release the OF nodes refcounts, we still need a synchronization
> > > > > > > point
> > > > > > > somewhere.
> > > > > > >
> > > > > > > Anyways, I would like to have some feedback on how acceptable
> > > > > > > would
> > > > > > > this
> > > > > > > be or what else could I do so we can have a "clean" dt overlay
> > > > > > > removal.
> > > > > > >
> > > > > > > I'm also including dt folks so they can give some comments on the
> > > > > > > new
> > > > > > > device_node_overlay_removal() function. My goal is to try to
> > > > > > > detect
> > > > > > > when
> > > > > > > an
> > > > > > > overlay is being removed (maybe we could even have an explicit
> > > > > > > flag
> > > > > > > for
> > > > > > > it?) and only directly call device_link_release_fn() in that case.
> > > > > > >
> > > > > > > [1]:
> > > > > > > https://lore.kernel.org/linux-devicetree/20230511151047.1779841-1-nuno.sa@xxxxxxxxxx/
> > > > > > > ---
> > > > > > >  drivers/base/core.c | 25 ++++++++++++++++++++++++-
> > > > > > >  1 file changed, 24 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > > > > index 14d46af40f9a..31ea001f6142 100644
> > > > > > > --- a/drivers/base/core.c
> > > > > > > +++ b/drivers/base/core.c
> > > > > > > @@ -497,6 +497,18 @@ static struct attribute *devlink_attrs[] = {
> > > > > > >  };
> > > > > > >  ATTRIBUTE_GROUPS(devlink);
> > > > > > >
> > > > > > > +static bool device_node_overlay_removal(struct device *dev)
> > > > > > > +{
> > > > > > > +     if (!dev_of_node(dev))
> > > > > > > +             return false;
> > > > > > > +     if (!of_node_check_flag(dev->of_node, OF_DETACHED))
> > > > > > > +             return false;
> > > > > > > +     if (!of_node_check_flag(dev->of_node, OF_OVERLAY))
> > > > > > > +             return false;
> > > > > > > +
> > > > > > > +     return true;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void device_link_release_fn(struct work_struct *work)
> > > > > > >  {
> > > > > > >       struct device_link *link = container_of(work, struct
> > > > > > > device_link,
> > > > > > > rm_work);
> > > > > > > @@ -532,8 +544,19 @@ static void devlink_dev_release(struct device
> > > > > > > *dev)
> > > > > > >        * synchronization in device_link_release_fn() and if the
> > > > > > > consumer
> > > > > > > or
> > > > > > >        * supplier devices get deleted when it runs, so put it into
> > > > > > > the
> > > > > > > "long"
> > > > > > >        * workqueue.
> > > > > > > +      *
> > > > > > > +      * However, if any of the supplier, consumer nodes is being
> > > > > > > removed
> > > > > > > +      * through overlay removal, the expectation in
> > > > > > > +      * __of_changeset_entry_destroy() is for the node 'kref' to
> > > > > > > be 1
> > > > > > > which
> > > > > > > +      * cannot be guaranteed with the async nature of
> > > > > > > +      * device_link_release_fn(). Hence, do it synchronously for
> > > > > > > the
> > > > > > > overlay
> > > > > > > +      * case.
> > > > > > >        */
> > > > > > > -     queue_work(system_long_wq, &link->rm_work);
> > > > > > > +     if (device_node_overlay_removal(link->consumer) ||
> > > > > > > +         device_node_overlay_removal(link->supplier))
> > > > > > > +             device_link_release_fn(&link->rm_work);
> > > > > > > +     else
> > > > > > > +             queue_work(system_long_wq, &link->rm_work);
> > > > > > >  }
> > > > > > >
> > > > > > >  static struct class devlink_class = {
> > > > > > >
> > > > > > > ---
>
> [cut]
>
> > > No, IMV devlink_dev_release() needs to be called via
> > > device_link_put_kref(), but it may run device_link_release_fn()
> > > directly if the link is marked in a special way or something like
> > > this.
> >
> > Sorry, I'm not totally getting this. I'm directly calling
> > device_link_release_fn() from  devlink_dev_release(). We should only get
> > into
> > devlink_dev_release() after all the references are dropped right (being it
> > the
> > release callback for the link class)?
>
> OK, I got confused somehow, sorry.
>
> It should work.
>
> I kind of don't like adding OF-specific code to the driver core, but
> if this is fine with Greg, it can be done.  It should depend on

Not perfect but I'm not seeing any other way. We need to somehow see if the node
is part of an OVERLAY and AFAIK, the only way is looking at the node flags. I'll
wait on Greg's feedback.

> CONFIG_OF_OVERLAY, though.

I guess that should be already indirectly implied. I mean if CONFIG_OF_OVERLAY
is not set, I guess there's not way for
of_node_check_flag(dev->of_node, OF_OVERLAY)) return true. But yeah, I can bail
out right away if IS_ENABLED(CONFIG_OF_OVERLAY) is not set.

> I would like a comment to be added to device_link_release_fn() to
> explain why the overlay case needs synchronous execution in there.

I do have the following comment before checking device_node_overlay_removal():


"* However, if any of the supplier, consumer nodes is being removed
* through overlay removal, the expectation in
* __of_changeset_entry_destroy() is for the node 'kref' to be 1 which
* cannot be guaranteed with the async nature of
* device_link_release_fn(). Hence, do it synchronously for the overlay
* case."

I can elaborate more if you prefer...

>
> > device_node_overlay_removal() is my way to see if the link "is marked in a
> > special way" as you put it. This checks if one of the supplier/consumer is
> > marked as an OVERLAY and if it's being removed (I think that OF_DETACHED
> > tells
> > us that but some feedback from DT guys will be helpful).
> >
> > Alternatively, I could just check the OF_OVERLAY flag when the link is being
> > created and have a new variable in struct device_link to flag the
> > synchronous
> > release. Disadvantage is that in this way even a sysfs unbind or module
> > unload
> > (without necessarily removing the overly) would lead to a synchronous
> > release
> > which can actually make sense (now that I think about it). Because if
> > someone
> > does some crazy thing like "echo device > unbind" and then removes the
> > overlay
> > we could still hit the overly removal path before device_link_release_fn()
> > completed.
>
> This sounds more complicated than the current patch.

Agreed...

- Nuno Sá