Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

From: Saravana Kannan
Date: Mon Feb 01 2021 - 04:03:55 EST


On Mon, Feb 1, 2021 at 12:05 AM Marek Szyprowski
<m.szyprowski@xxxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On 30.01.2021 05:08, Saravana Kannan wrote:
> > On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >> This patch series solves two general issues with fw_devlink=on
> >>
> >> Patch 1/2 addresses the issue of firmware nodes that look like they'll
> >> have struct devices created for them, but will never actually have
> >> struct devices added for them. For example, DT nodes with a compatible
> >> property that don't have devices added for them.
> >>
> >> Patch 2/2 address (for static kernels) the issue of optional suppliers
> >> that'll never have a driver registered for them. So, if the device could
> >> have probed with fw_devlink=permissive with a static kernel, this patch
> >> should allow those devices to probe with a fw_devlink=on. This doesn't
> >> solve it for the case where modules are enabled because there's no way
> >> to tell if a driver will never be registered or it's just about to be
> >> registered. I have some other ideas for that, but it'll have to come
> >> later thinking about it a bit.
> >>
> >> These two patches might remove the need for several other patches that
> >> went in as fixes for commit e590474768f1 ("driver core: Set
> >> fw_devlink=on by default"), but I think all those fixes are good
> >> changes. So I think we should leave those in.
> >>
> >> Marek, Geert,
> >>
> >> Can you try this series on a static kernel with your OF_POPULATED
> >> changes reverted? I just want to make sure these patches can identify
> >> and fix those cases.
> >>
> >> Tudor,
> >>
> >> You should still make the clock driver fix (because it's a bug), but I
> >> think this series will fix your issue too (even without the clock driver
> >> fix). Can you please give this a shot?
> > Marek, Geert, Tudor,
> >
> > Forgot to say that this will probably fix your issues only in a static
> > kernel. So please try this with a static kernel. If you can also try
> > and confirm that this does not fix the issue for a modular kernel,
> > that'd be good too.
>
> I've checked those patches on top of linux next-20210129 with
> c09a3e6c97f0 ("soc: samsung: pm_domains: Convert to regular platform
> driver") commit reverted.

Hi Marek,

Thanks for testing!

> Sadly it doesn't help.

That sucks. I even partly "tested" it out on my platform (that needs
CONFIG_MODULES) by commenting out the CONFIG_MODULES check. And I saw
some device links getting dropped.

> All devices that belong

By belong, I assume you meant "are consumers"?

> to the Exynos power domains are never probed and stay endlessly on the
> deferred devices list. I've used static kernel build - the one from
> exynos_defconfig.

Can you enable the dev_dbg in __device_link_del() (the SRCU variant)?
Hopefully at least some of the device links would be dropped?

If the PD device link is not dropped, I wonder why this condition is
not hitting for consumers of the PD.

if (fw_devlink_def_probe_retry &&
link->flags & DL_FLAG_INFERRED &&
!device_links_probe_blocked_by(link->supplier)) {
device_link_drop_managed(link);
continue;
}

Could you try logging dev, link->supplier and
device_links_probe_blocked_by() return value. That should tell when a
consumer is waiting on a PD, why the PD might appear as waiting on
something else. I can't imagine the DL_FLAG_INFERRED being cleared
(it'll only happen when a driver/framework explicitly creates a device
link). Remind me again where the DT for this board is? Does the PD
depend on something else?

One other possibility is that some of the consumers of the PD could be
using the *_platform_driver_probe() macro/function that never
reattempts a probe. So even though this patch might drop the device
links, the consumer never tries again.

-Saravana