Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

From: Saravana Kannan
Date: Tue Feb 02 2021 - 03:36:10 EST


On Tue, Feb 2, 2021 at 12:12 AM Marek Szyprowski
<m.szyprowski@xxxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On 01.02.2021 10:02, Saravana Kannan wrote:
> > On Mon, Feb 1, 2021 at 12:05 AM Marek Szyprowski
> > <m.szyprowski@xxxxxxxxxxx> wrote:
> >> On 30.01.2021 05:08, Saravana Kannan wrote:
> >>> On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >>>> This patch series solves two general issues with fw_devlink=on
> >>>>
> >>>> Patch 1/2 addresses the issue of firmware nodes that look like they'll
> >>>> have struct devices created for them, but will never actually have
> >>>> struct devices added for them. For example, DT nodes with a compatible
> >>>> property that don't have devices added for them.
> >>>>
> >>>> Patch 2/2 address (for static kernels) the issue of optional suppliers
> >>>> that'll never have a driver registered for them. So, if the device could
> >>>> have probed with fw_devlink=permissive with a static kernel, this patch
> >>>> should allow those devices to probe with a fw_devlink=on. This doesn't
> >>>> solve it for the case where modules are enabled because there's no way
> >>>> to tell if a driver will never be registered or it's just about to be
> >>>> registered. I have some other ideas for that, but it'll have to come
> >>>> later thinking about it a bit.
> >>>>
> >>>> These two patches might remove the need for several other patches that
> >>>> went in as fixes for commit e590474768f1 ("driver core: Set
> >>>> fw_devlink=on by default"), but I think all those fixes are good
> >>>> changes. So I think we should leave those in.
> >>>>
> >>>> Marek, Geert,
> >>>>
> >>>> Can you try this series on a static kernel with your OF_POPULATED
> >>>> changes reverted? I just want to make sure these patches can identify
> >>>> and fix those cases.
> >>>>
> >>>> Tudor,
> >>>>
> >>>> You should still make the clock driver fix (because it's a bug), but I
> >>>> think this series will fix your issue too (even without the clock driver
> >>>> fix). Can you please give this a shot?
> >>> Marek, Geert, Tudor,
> >>>
> >>> Forgot to say that this will probably fix your issues only in a static
> >>> kernel. So please try this with a static kernel. If you can also try
> >>> and confirm that this does not fix the issue for a modular kernel,
> >>> that'd be good too.
> >> I've checked those patches on top of linux next-20210129 with
> >> c09a3e6c97f0 ("soc: samsung: pm_domains: Convert to regular platform
> >> driver") commit reverted.
> > Hi Marek,
> >
> > Thanks for testing!
> >
> >> Sadly it doesn't help.
> > That sucks. I even partly "tested" it out on my platform (that needs
> > CONFIG_MODULES) by commenting out the CONFIG_MODULES check. And I saw
> > some device links getting dropped.
>
> Well, my fault. I've missed the fact that I have to disable
> CONFIG_MODULES to let it work. This is not really a fix for my case,
> because the exynos_defconfig has modules enabled (mainly for WiFi and
> media drivers). However disabling the CONFIG_MODULES indeed helped a
> bit. Most of the devices got finally probed. There are only 4 left in
> the deferred_devices list:
>
> sound
> 12e20000.sysmmu
> 12d00000.hdmi
> 12c10000.mixer
>
> The last two (12c10000.mixer and 12d00000.hdmi) are consumers of the
> 12e20000.sysmmu, which is a consumer of the 10023c20.power-domain. That
> power domain in turn is a consumer (child) of another power domain
> (10023c80.power-domain):
>
> # dmesg | grep 10023c20.power-domain
> [ 0.354435] platform 10023c20.power-domain: Linked as a consumer to
> 10023c80.power-domain
> [ 0.489573] platform 12d00000.hdmi: Linked as a consumer to
> 10023c20.power-domain
> [ 0.497143] platform 12c10000.mixer: Linked as a consumer to
> 10023c20.power-domain
> [ 0.580874] platform 12e20000.sysmmu: Linked as a consumer to
> 10023c20.power-domain
> [ 0.601655] platform 12e20000.sysmmu: probe deferral - supplier
> 10023c20.power-domain not ready
> [ 2.744884] platform 12c10000.mixer: probe deferral - supplier
> 10023c20.power-domain not ready
> [ 2.766726] platform 12d00000.hdmi: probe deferral - supplier
> 10023c20.power-domain not ready
>
> ...
>
> So a dependency chain of 2 power domains is still not resolved properly.
>
> I didn't have time to check what's wrong with the sound node. Simple
> grepping of the messages for the 'sound' string don't give any results.
> The above tests has been done on the Odroid U3 board
> (arch/arm/boot/dts/exynos4412-odroidu3.dts).

Thanks for testing again! This actually gave me valuable info. The
problem is that 10023c20.power-domain (let's call it PD-B) never gets
added to the deferred probe list (because that only happens once a
driver is registered). So, it never gets to drop it's dependency on
10023c20.power-domain (let's call it PD-A). So, once all drivers are
registered and the SMMU checks if it needs to drop the device link to
PD-B, it sees that PD-B is still waiting on suppliers. So it could be
that the PD-B would probe in the future. But PD-B never probes in the
future.

Anyway, I was trying to avoid climbing the tree/graph with Patch 2/3.
But I might not be able to avoid it. Maybe all I need to check is
whether it's in the deferred probe list. Let me think about it. And
maybe it will solve Geert's issue too.

-Saravana