Re: [PATCH] arm64: dts: qcom: sdm845-db845c: Move LVS regulator nodes up

From: Amit Pundir
Date: Thu Jun 15 2023 - 12:10:30 EST


On Thu, 15 Jun 2023 at 20:33, Krzysztof Kozlowski
<krzysztof.kozlowski@xxxxxxxxxx> wrote:
>
> On 15/06/2023 15:47, Amit Pundir wrote:
> > On Thu, 15 Jun 2023 at 00:38, Amit Pundir <amit.pundir@xxxxxxxxxx> wrote:
> >>
> >> On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski
> >> <krzysztof.kozlowski@xxxxxxxxxx> wrote:
> >>>
> >>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> >>>> On 02.06.23 18:12, Amit Pundir wrote:
> >>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
> >>>>> list to workaround a boot regression uncovered by the upstream
> >>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
> >>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
> >>>>>
> >>>>> Without this fix DB845c fail to boot at times because one of the
> >>>>> lvs1 or lvs2 regulators fail to turn ON in time.
> >>>>
> >>>> /me waves friendly
> >>>>
> >>>> FWIW, as it's not obvious: this...
> >>>>
> >>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@xxxxxxxxxxxxxx/
> >>>>
> >>>> ...is a report about a regression. One that we could still solve before
> >>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into
> >>>> sight.
> >>>>
> >>>> When I noticed the reluctant replies to this patch I earlier today asked
> >>>> in the thread with the report what the plan forward was:
> >>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@xxxxxxxxxxxxxx/
> >>>>
> >>>> Dough there replied:
> >>>>
> >>>> ```
> >>>> Of the two proposals made (the revert vs. the reordering of the dts),
> >>>> the reordering of the dts seems better. It only affects the one buggy
> >>>> board (rather than preventing us to move to async probe for everyone)
> >>>> and it also has a chance of actually fixing something (changing the
> >>>> order that regulators probe in rpmh-regulator might legitimately work
> >>>> around the problem). That being said, just like the revert the dts
> >>>> reordering is still just papering over the problem and is fragile /
> >>>> not guaranteed to work forever.
> >>>> ```
> >>>>
> >>>> Papering over obviously is not good, but has anyone a better idea to fix
> >>>> this? Or is "not fixing" for some reason an viable option here?
> >>>>
> >>>
> >>> I understand there is a regression, although kernel is not mainline
> >>> (hash df7443a96851 is unknown) and the only solutions were papering the
> >>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS
> >>> is not acceptable because it hides actual problem and only solves this
> >>> one particular observed problem, while actual issue is still there. It
> >>> would be nice to be able to reproduce it on real mainline with normal
> >>> operating system (not AOSP) - with ramdiks/without/whatever. So far no
> >>> one did it, right?
> >>
> >> No, I did not try non-AOSP system yet. I'll try it tomorrow, if that
> >> helps. With mainline hash.
> >
> > Hi, here is the crash report on db845c running vanilla v6.4-rc6 with a
> > debian build https://bugs.linaro.org/attachment.cgi?id=1142
> >
> > And fwiw here is the db845c crash log with AOSP running vanilla
> > v6.4-rc6 https://bugs.linaro.org/attachment.cgi?id=1141
> >
> > Regards,
> > Amit Pundir
> >
> > PS: rootfs in this bug report doesn't matter much because I'm loading
> > all the kernel modules from a ramdisk and in the case of a crash the
> > UFS doesn't probe anyway.
>
> I just tried current next with defconfig (I could not find your config,
> neither here, nor in your previous mail thread nor in bugzilla). Also
> with REGULATOR_QCOM_RPMH as module.
>
> I tried also v6.4-rc6 - also defconfig with default and module
> REGULATOR_QCOM_RPMH.
>
> All the cases work on my RB3 - no warnings reported.
>
> If you do not use defconfig, then in all reports please mention the
> differences (the best) or at least attach it.

Argh.. Sorry about that. Big mistake from my side. I did want to
upload my defconfig but forgot. Defconfig plays a key role because, as
I mentioned in one of my previous email, it is a timing/race bug and
if I do any much changes in my defconfig (i.e. enable ftrace for
example or as little as add printk in qcom_rpmh_regulator code) then I
can't reproduce this bug. So needless to say that I can't reproduce
this bug with default arm64 defconfig.

Please find my custom (but upstream) defconfig here
https://bugs.linaro.org/attachment.cgi?id=1143 and prebuilt binaries
here https://people.linaro.org/~amit.pundir/db845c-userdebug/rpmh_bug/.
"fastboot flash boot ./boot.img-6.4-rc6 reboot" and/or a few (<5)
reboots should be enough to trigger the crash.

I have downloaded the initrd from here
https://snapshots.linaro.org/96boards/dragonboard845c/linaro/debian/569/initrd.img-5.15.0-qcomlt-arm64
but edited ramdisk/init to run "load_module" function early in the
boot and ramdisk/conf/initramfs.conf has "MODULES=list" instead of
"MODULES=most", where all the kernel modules are listed at
/etc/initramfs-tools/modules.

Regards,
Amit Pundir