Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

From: Adam Ford
Date: Fri Jan 08 2021 - 13:32:12 EST


On Fri, Jan 8, 2021 at 7:45 AM Adam Ford <aford173@xxxxxxxxx> wrote:
>
> On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> >
> > * H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> [201230 13:29]:
> > > > Am 30.12.2020 um 13:55 schrieb Adam Ford <aford173@xxxxxxxxx>:
> > > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> > > >>
> > > >> At least for 4430, trying to use the single conversion mode eventually
> > > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > > >>
> > > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> > ...
> >
> > > > I don't have an OMAP4, but if you want, I can test a DM3730.
> > >
> > > Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> > > always those from the last measurement. E.g. the first one was done
> > > during (cold) boot and the first request after 10 minutes did show a
> > > quite cold system... The next one did show a hot system independent
> > > of what had been between (suspend or high activity).
> > >
> > > It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
> > > So it is quite fundamental.
> > >
> > > We tried to fix it but did not come to a solution [2]. So we opened an issue
> > > in our tracker [3] and decided to stay with continuous conversion although this
> > > raises idle mode processor load.
> >
> > Hmm so maybe eocz high always times out in single mode since it also
> > triggers at least on dra7?
> >
> > Yes it would be great if you guys can the $subject patch a try at
> > least on your omap36xx and omap5 boards and see if you see eocz
> > time out warnings in dmesg.
>
> I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.

I am going to be a bit delayed testing this. I cannot boot omap2plus
using Linux version 5.11.0-rc2.

[ 2.666748] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc
[ 2.673309] nand: Micron MT29F4G16ABBDA3W
[ 2.677368] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 64
[ 2.685119] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW
[ 2.693237] Invalid ECC layout
[ 2.696350] omap2-nand 30000000.nand: unable to use BCH library
[ 2.702575] omap2-nand: probe of 30000000.nand failed with error -22
[ 2.716094] 8<--- cut here ---
[ 2.719207] Unable to handle kernel NULL pointer dereference at
virtual address 00000018
[ 2.727600] pgd = (ptrval)
...
[ 3.050933] ---[ end trace 59640c7399a80a07 ]---
[ 3.055603] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[ 3.063323] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b ]---

Once I get past this, I'll try to test the thermal stuff.

adam

>
> adam
> >
> > Regards,
> >
> > Tony