Re: Device links between providers and consumers

From: Bartosz Golaszewski
Date: Tue Nov 21 2023 - 05:05:21 EST


On Tue, Nov 21, 2023 at 5:53 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> Hi Bartosz,
>
> Adding LKML so that others are aware of what the issue is and it'll be
> easier when I get to my TODO patches and send them out. I'm hoping
> that's okay with you because we didn't discuss anything confidential
> here.
>

That's alright, thanks for starting the public conversation.

> On Mon, Nov 20, 2023 at 2:21 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >
> > On Mon, Nov 20, 2023 at 12:38 PM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
> > >
> > > Hi Saravana,
> > >
> > > As I suspected, I couldn't observe the behavior you described during
> > > our discussion at the LPC event. I have a DT GPIO provider and a
> > > consumer referencing it by phandle. I'm unbinding the provider and the
> > > consumer keeps on living, if it tries to use the GPIO, then it will
> > > enter the regular code path in GPIO for checking if the provider is
> > > there or not.
> > >
> > > Could you point me in the right direction here?
> >
> > Thanks for trying it out! Based on the code it should unbind the
> > consumers. I haven't ever tried it myself (no need for it).
>
> I took a closer look to show you where the consumer unbind is supposed
> to be done, but in doing so I think I know what issue you are hitting.
> One of my TODO items for device links should fix your problem.
>
> The force unbinding of consumers when the supplier is unbound is
> supposed to happen here:
> device_driver_detach()
> -> device_release_driver_internal()
> -> __device_release_driver()
> -> device_links_unbind_consumers()
> -> for all "active" consumer -> device_release_driver_internal()
>
> However the problem is the "if (drv)" check in __device_release_driver().
>
> This problem also exists for "class" device suppliers that don't have
> a drv. Fixing managed device links for "class" suppliers (and now, bus
> suppliers without drv) has been in my TODO list for a while.
>
> The gpio device is one of the cases of a "bus" device probing without
> a driver. A while ago, I implemented a gpio_bus_match() that'll probe
> the gpio device (so consumer probing isn't blocked) and I was trying
> to keep the boilerplate code minimalistic. So, for your test case, a
> quick test hack would be to implement an actual stub driver instead of
> using a stub bus match. That should fix your problem with the
> consumers not unbinding. I'll put up a proper fix sometime soon
> (hopefully over the holiday lulls).
>

But I don't even see any code referring to device_link in
drivers/gpio/. I see that if you get a regulator, there is a link
created between the regulator device and the consumer device in
_regulator_get() but nothing like that in GPIO.

> Btw, when we were talking in person at the LPC dinner, you were asking
> "what would you do if the supplier was an optional supplier but you
> forcefully unbound the consumer?" I have a nice answer now:
>

Actually, my question was: "what if a resource is optional and the
provider of that resource gets unbound". But below you still did
answer this question. :)

> After a force unbind, we need to add all these consumers to the
> deferred probe list and trigger another deferred probe attempt. If the
> supplier was optional, the consumer would probe again. This also has
> the nice property that the consumer doesn't advertise something to
> userspace that it can't deliver (because the supplier has gone
> missing) and it makes the error handling easier for drivers. They
> don't have to worry about suppliers vanishing in every part of their
> code. Once they get the supplier and probe successfully, they
> shouldn't have to worry about it vanishing underneath them.
>

Let me rephrase it to see if I understand it:

1. Provider is probed.
2. Consumer is probed and gets the resource from provider.
3. Provider is unbound.
4. As a result, the consumer is unbound.
5. Consumer is put into the deferred probe list.
6. Consumer binds again to its driver but this time doesn't get the resource.

It makes some sense I guess but then you have to deal with the device
disappearing for a brief moment in whatever code uses it so it's not
like it has no price over handling the provider unbind in consumers.
If you're exposing anything to user-space, you're offloading that
handling to it.

There are also two approaches to handling the providers unbinding:
returning an error from API calls vs returning 0 and doing nothing in
which case most of the consumer code can remain the same. This is what
GPIO does ATM.

> Cheers,
> Saravana
>

Is there a way for a driver to alter the behavior? For instance tell
the device core that it should not unbind it if the provider is
detached?

The behavior in general makes sense but it only applies to platform
devices on DT systems and has some corner-cases that would need to be
ironed out. What I proposed is more generic as it also covers
resources exposed to drivers or user-space from discoverable devices.

Bart

>
> >
> > Let's start with making sure the basic functionality is working in your case.
> >
> > Can you check /sys/class/devlink to see if you see a folder with the
> > following name?
> > <bus:supplier>--<bus:consumer>
> >
> > Once you find it, can you cat all the file contents and tell me what
> > it says before you unbind it?
> >
> > The "status" should be "available". And "sync_state_only" should be false.
> >
> > Also, how are you unbinding the supplier? And does the board you are
> > playing with something that's upstream? Should we take this discussion
> > to LKML?
> >
> > -Saravana