Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

From: Nuno Sá
Date: Fri Mar 01 2024 - 02:14:15 EST


On Thu, 2024-02-29 at 15:26 -0800, Saravana Kannan wrote:
> On Fri, Feb 23, 2024 at 2:41 AM Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> >
> > On Fri, 2024-02-23 at 10:11 +0100, Herve Codina wrote:
> > > Hi Saravana,
> > >
> > > On Tue, 20 Feb 2024 16:31:13 -0800
> > > Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > >
> > > ...
> > >
> > > > > +void device_link_wait_removal(void)
> > > > > +{
> > > > > +       /*
> > > > > +        * devlink removal jobs are queued in the dedicated work queue.
> > > > > +        * To be sure that all removal jobs are terminated, ensure that
> > > > > any
> > > > > +        * scheduled work has run to completion.
> > > > > +        */
> > > > > +       drain_workqueue(fw_devlink_wq);
> > > >
> > > > Is there a reason this needs to be drain_workqueu() instead of
> > > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > > case. All we are trying to make sure is that all the device link
> > > > remove work queued so far have completed.
> > >
> > > I used drain_workqueue() because drain_workqueue() allows for jobs already
> > > present in a workqueue to re-queue a job and drain_workqueue() will wait
> > > also for this new job completion.
> > >
> > > I think flush_workqueue() doesn't wait for this chain queueing.
> > >
> > > In our case, my understanding was that device_link_release_fn() calls
> > > put_device() for the consumer and the supplier.
> > > If refcounts reaches zero, devlink_dev_release() can be called again
> > > and re-queue a job.
> > >
> >
> > Looks sensible. The only doubt (that Saravana mays know better) is that I'm not
> > sure put_device() on a supplier or consumer can actually lead to
> > devlink_dev_release(). AFAIU, a consumer or a supplier should not be a device
> > from the devlink class. Hence, looking at device_release(), I'm not sure it can
> > happen unless for some odd reason someone is messing with devlinks in .remove()
> > or .type->remove().
>
> The case we are trying to fix here involves a supplier or a consumer
> device (say Device-A) being device_del(). When that happens, all the
> device links to/from the device are deleted by a call to
> device_links_purge() since a device link can't exist without both the
> supplier and consumer existing.
>
> The problem you were hitting is that the device link deletion code
> does the put_device(Device-A) in a workqueue. You change is to make
> sure to wait until that has completed. To do that, you only need to
> wait for the device link deletion work (already queued before
> returning from device_del()) to finish. You don't need to wait for
> anything more.
>
> I read up on drain_workqueue() before I made my comments. The point I
> was trying to make is that there could be some unrelated device link
> deletions that you don't need to wait on.
>
> But taking a closer look[1], it looks like drain_workqueue() might
> actually cause bugs because while a workqueue is being drained, if
> another unrelated device link deletion is trying to queue work, that
> will get ignored.
>

Oh, even worst then... please also take a look at the new v3 Herve sent. Herve is
already convinced about flush_workqueue(). The other sensible discussion is about
releasing the of_mutex in patch 2. I'm not convinced we need it but you may know
better.

> Reply to rest of the emails in this thread here:
>
> Nuno,
>
> Sorry if I messed up who sent the first patch, but I did dig back to
> your v1. But I could be wrong.
>

I did sent first a RFC [1] (which should also count :)). And it actually took a lot
of "pushing" with resends to get some attention on this. And if follow the RFC you'll
even see that I first reported the issue in May or something (but did not really put
too much effort on it at the time).

I have to admit it's a bit frustrating given how much I pushed and insisted in fixing
this (and not have my own patches in :P). But that's life and in the end of day I
just care about this being fixed. So, no hard feelings :).

> If devlink_dev_release() could have done the work synchronously, we'd
> have done it in the first place. It's actually a bug because
> devlink_dev_release() gets called in atomic context but the
> put_device() on the supplier/consumer can do some sleeping work.
>

Not sure I'm following the above. I may be missing something but looking at the code
paths it actually looks like devlink_dev_release() is always called with the
device_links_lock held. Therefore we need to be already in a sleeping context or we
already have a problem...

Looking at git history, the problem we had before was that we were using call_srcu()
and the srcu callback cannot sleep which could happen in a device release function.

Anyways, Rafael already said he's fine in erroring out in case the queue fails to
allocate (as you said, if that happens the system is already likely screwed). My only
complain now is in the place we're allocating the queue.

[1]: https://lore.kernel.org/lkml/20231127-fix-device-links-overlays-v1-1-d7438f56d025@xxxxxxxxxx/

- Nuno Sá

> -Saravana
>
> [1] -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/workqueue.c#n1727