Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

From: Saravana Kannan
Date: Thu Jun 23 2022 - 04:22:27 EST


On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
>
> * Saravana Kannan <saravanak@xxxxxxxxxx> [220622 19:05]:
> > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > * Saravana Kannan <saravanak@xxxxxxxxxx> [220621 19:29]:
> > > > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > * Saravana Kannan <saravanak@xxxxxxxxxx> [700101 02:00]:
> > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > "power-domains" property, the execution will never get to the point
> > > > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > > > has probed successfully or before deferred probe timeout has expired.
> > > > > >
> > > > > > So, delete the call and replace it with -ENODEV.
> > > > >
> > > > > Looks like this causes omaps to not boot in Linux next.
> > > >
> > > > Can you please point me to an example DTS I could use for debugging
> > > > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > > > off or putting it in permissive mode.
> > >
> > > Sure, this seems to happen at least with simple-pm-bus as the top
> > > level interconnect with a configured power-domains property:
> > >
> > > $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus
> >
> > Thanks for the example. I generally start looking from dts (not dtsi)
> > files in case there are some DT property override/additions after the
> > dtsi files are included in the dts file. But I'll assume for now
> > that's not the case. If there's a specific dts file for a board I can
> > look from that'd be helpful to rule out those kinds of issues.
> >
> > For now, I looked at arch/arm/boot/dts/omap4.dtsi.
>
> OK it should be very similar for all the affected SoCs.
>
> > > This issue is no directly related fw_devlink. It is a side effect of
> > > removing driver_deferred_probe_check_state(). We no longer return
> > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> >
> > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > was deleted because fw_devlink=on should have short circuited the
> > probe attempt with an -EPROBE_DEFER before reaching the bus/driver
> > probe function and hitting this -ENOENT failure. That's why I was
> > asking the other questions.
>
> OK. So where is the -EPROBE_DEFER supposed to happen without
> driver_deferred_probe_check_state() then?

device_links_check_suppliers() call inside really_probe() would short
circuit and return an -EPROBE_DEFER if the device links are created as
expected.

>
> > > > > On platform_probe() genpd_get_from_provider() returns
> > > > > -ENOENT.
> > > >
> > > > This error is with the series I assume?
> > >
> > > On the first probe genpd_get_from_provider() will return -ENOENT in
> > > both cases. The list is empty on the first probe and there are no
> > > genpd providers at this point.
> > >
> > > Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> > > ends up getting changed to -EPROBE_DEFER at the end of
> > > driver_deferred_probe_check_state(), we are now missing that.
> >
> > Right, I was aware -ENOENT would be returned if we got this far. But
> > the point of this series is that you shouldn't have gotten that far
> > before your pm domain device is ready. Hence my questions from the
> > earlier reply.
>
> OK
>
> > Can I get answers to rest of my questions in the first reply please?
> > That should help us figure out why fw_devlink let us get this far.
> > Summarize them here to make it easy:
> > * Are you running with fw_devlink=on?
>
> Yes with the default with no specific kernel params so looks like
> FW_DEVLINK_FLAGS_ON.
>
> > * Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?
>
> Yes
>
> > * If it's not built-in, can you please try deferred_probe_timeout=0
> > and deferred_probe_timeout=30 and see if either one of them help?
>
> It's built in so I did not try these.
>
> > * Can I get the output of "ls -d supplier:*" and "cat
> > supplier:*/status" output from the sysfs dir for the ocp device
> > without this series where it boots properly.
>
> Hmm so I'm not seeing any supplier for the top level ocp device in
> the booting case without your patches. I see the suppliers for the
> ocp child device instances only.

Hmmm... this is strange (that the device link isn't there), but this
is what I suspected.

Now we need to figure out why it's missing. There are only a few
things that could cause this and I don't see any of those. I already
checked to make sure the power domain in this instance had a proper
driver with a probe() function -- if it didn't, then that's one thing
that'd could have caused the missing device link. The device does seem
to have a proper driver, so looks like I can rule that out.

Can you point me to the dts file that corresponds to the specific
board you are testing this one? I probably won't find anything, but I
want to rule out some of the possibilities.

All the device link creation logic is inside drivers/base/core.c. So
if you can look at the existing messages or add other stuff to figure
out why the device link isn't getting created, that'd be handy. In
either case, I'll continue staring at the DT and code to see what
might be happening here.

-Saravana