Re: Device links between providers and consumers

From: Bartosz Golaszewski
Date: Tue Nov 21 2023 - 08:35:07 EST


On Tue, Nov 21, 2023 at 11:05 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
>
> On Tue, Nov 21, 2023 at 5:53 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >
> > Hi Bartosz,
> >
> > Adding LKML so that others are aware of what the issue is and it'll be
> > easier when I get to my TODO patches and send them out. I'm hoping
> > that's okay with you because we didn't discuss anything confidential
> > here.
> >
>
> That's alright, thanks for starting the public conversation.
>
> > On Mon, Nov 20, 2023 at 2:21 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 20, 2023 at 12:38 PM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > As I suspected, I couldn't observe the behavior you described during
> > > > our discussion at the LPC event. I have a DT GPIO provider and a
> > > > consumer referencing it by phandle. I'm unbinding the provider and the
> > > > consumer keeps on living, if it tries to use the GPIO, then it will
> > > > enter the regular code path in GPIO for checking if the provider is
> > > > there or not.
> > > >
> > > > Could you point me in the right direction here?
> > >
> > > Thanks for trying it out! Based on the code it should unbind the
> > > consumers. I haven't ever tried it myself (no need for it).
> >
> > I took a closer look to show you where the consumer unbind is supposed
> > to be done, but in doing so I think I know what issue you are hitting.
> > One of my TODO items for device links should fix your problem.
> >
> > The force unbinding of consumers when the supplier is unbound is
> > supposed to happen here:
> > device_driver_detach()
> > -> device_release_driver_internal()
> > -> __device_release_driver()
> > -> device_links_unbind_consumers()
> > -> for all "active" consumer -> device_release_driver_internal()
> >
> > However the problem is the "if (drv)" check in __device_release_driver().
> >
> > This problem also exists for "class" device suppliers that don't have
> > a drv. Fixing managed device links for "class" suppliers (and now, bus
> > suppliers without drv) has been in my TODO list for a while.
> >
> > The gpio device is one of the cases of a "bus" device probing without
> > a driver. A while ago, I implemented a gpio_bus_match() that'll probe
> > the gpio device (so consumer probing isn't blocked) and I was trying
> > to keep the boilerplate code minimalistic. So, for your test case, a
> > quick test hack would be to implement an actual stub driver instead of
> > using a stub bus match. That should fix your problem with the
> > consumers not unbinding. I'll put up a proper fix sometime soon
> > (hopefully over the holiday lulls).
> >
>
> But I don't even see any code referring to device_link in
> drivers/gpio/. I see that if you get a regulator, there is a link
> created between the regulator device and the consumer device in
> _regulator_get() but nothing like that in GPIO.
>
> > Btw, when we were talking in person at the LPC dinner, you were asking
> > "what would you do if the supplier was an optional supplier but you
> > forcefully unbound the consumer?" I have a nice answer now:
> >
>
> Actually, my question was: "what if a resource is optional and the
> provider of that resource gets unbound". But below you still did
> answer this question. :)
>
> > After a force unbind, we need to add all these consumers to the
> > deferred probe list and trigger another deferred probe attempt. If the
> > supplier was optional, the consumer would probe again. This also has
> > the nice property that the consumer doesn't advertise something to
> > userspace that it can't deliver (because the supplier has gone
> > missing) and it makes the error handling easier for drivers. They
> > don't have to worry about suppliers vanishing in every part of their
> > code. Once they get the supplier and probe successfully, they
> > shouldn't have to worry about it vanishing underneath them.
> >
>
> Let me rephrase it to see if I understand it:
>
> 1. Provider is probed.
> 2. Consumer is probed and gets the resource from provider.
> 3. Provider is unbound.
> 4. As a result, the consumer is unbound.
> 5. Consumer is put into the deferred probe list.
> 6. Consumer binds again to its driver but this time doesn't get the resource.
>
> It makes some sense I guess but then you have to deal with the device
> disappearing for a brief moment in whatever code uses it so it's not
> like it has no price over handling the provider unbind in consumers.
> If you're exposing anything to user-space, you're offloading that
> handling to it.
>
> There are also two approaches to handling the providers unbinding:
> returning an error from API calls vs returning 0 and doing nothing in
> which case most of the consumer code can remain the same. This is what
> GPIO does ATM.
>
> > Cheers,
> > Saravana
> >
>
> Is there a way for a driver to alter the behavior? For instance tell
> the device core that it should not unbind it if the provider is
> detached?
>
> The behavior in general makes sense but it only applies to platform
> devices on DT systems and has some corner-cases that would need to be
> ironed out. What I proposed is more generic as it also covers
> resources exposed to drivers or user-space from discoverable devices.
>

Actually the two are largely orthogonal so can be developed independently.

One more thing: the device link mechanism will never work for
interrupts as the interrupt subsystem doesn't use struct device. Same
for clk but that's less of a problem as they are rarely detachable.

Bart

> Bart
>
> >
> > >
> > > Let's start with making sure the basic functionality is working in your case.
> > >
> > > Can you check /sys/class/devlink to see if you see a folder with the
> > > following name?
> > > <bus:supplier>--<bus:consumer>
> > >
> > > Once you find it, can you cat all the file contents and tell me what
> > > it says before you unbind it?
> > >
> > > The "status" should be "available". And "sync_state_only" should be false.
> > >
> > > Also, how are you unbinding the supplier? And does the board you are
> > > playing with something that's upstream? Should we take this discussion
> > > to LKML?
> > >
> > > -Saravana