Re: [PATCH] regulator: qcom-rpmh: Revert "regulator: qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"

From: Doug Anderson
Date: Tue Jun 06 2023 - 19:30:18 EST


Hi,

On Fri, Jun 2, 2023 at 6:13 AM Amit Pundir <amit.pundir@xxxxxxxxxx> wrote:
>
> On Fri, 2 Jun 2023 at 18:07, Mark Brown <broonie@xxxxxxxxxx> wrote:
> >
> > > > If you reorder the nodes in the device tree, I think it'll change the
> > > > probe order. Does that affect anything? I'm wondering if there's some
> > > > sort of delayed reaction from a previous regulator.
> >
> > > Hi, Bumping lvs1 and lvs2 regulators up to the top of the list in the
> > > DTS https://bugs.linaro.org/show_bug.cgi?id=5975#c4 does seem to work.
> > > I can't reproduce the crash in 125 reboots so far, while I'm still
> > > testing with only qcom-rpmh-regulator kernel module. I'll do some more
> > > testing with full system running and send this re-ordering fix I can't
> > > reproduce the crash further.
> >
> > So whatever the issue is here it's a timing/race condition - this seems
> > like a workaround which works just now but it's not getting to whatever
> > the actual issue is and that could come back.
>
> Hi, I'm happy to debug this issue further or test run any
> patches/ideas if that helps.

I guess it's a better workaround than reverting the async patch. ...at
least it has a chance of having a real effect. That being said, it's
still a bit of a hack.

Ideally you'd be able to somehow get information about RPMH's state
when things fail. Maybe this might be possible with ramdumps or maybe
with JTAG. Unfortunately, I'm not super familiar with either of them.
I assume you aren't either or you would have already tried them...

>From a black box perspective, I guess the things I could think of
would be to keep poking around with things that you control. Best
ideas I have:

1. Use "bisect" style techniques to figure out how much you really
need to move the "lvs" regulators. Try moving it halfway up the list.
If that works, move it closer to the bottom. If that doesn't work,
move it closer to the top. Eventually you'd find out which regulator
it's important to be before.

2. Try adding some delays to some of the regulators with
"regulator-enable-ramp-delay" and/or "regulator-settling-time-us".
Without a scope, it'll be tricky to figure out exactly which
regulators might need delays, but you could at least confirm if the
"overkill" approach of having all the regulators have some delay
helps... I guess you could also try putting a big delay for "ldo26".
If that works, you could try moving it up (again using a bisect style
approach) to see where the delay matters?


Currently, I guess my mental model of what might be going wrong is
that regulators are all turning on / adjusting really quickly. Maybe
they aren't switching into "high power mode" quickly enough, maybe
they are busy ramping up or down, or maybe there's simply some other
issue, but I suppose that something happening could be causing the
voltage to droop down (or be too high) and then that's making RPMH
upset. Changing the order could be helping avoid this droop, but the
more proper fix would be to actually account for it with regulator
constraints.

My mental model still doesn't really explain what async probe has to
do with anything, at least if you're loading _just_ rpmh-regulator all
on its own. Assuming you're loading rpmh-regulator all on its own then
async probe should do almost nothing. Unless I'm mistaken, async probe
won't cause all the RPMH regulators to initialize in parallel. They
still initialize synchronously in the rpmh-regulator probe routine.
Async probe would _just_ allow some other driver to run its probe at
the same time. ...but if no other drivers being loaded at the same
time then I'm perplexed about how it could make any difference.


-Doug