Re: [PATCH] i8k: Ignore temperature sensors which report invalid values

From: Pali RohÃr
Date: Wed Oct 22 2014 - 08:29:17 EST


On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> On 10/20/2014 09:46 AM, Pali RohÃr wrote:
> > Ok, I will describe my problem. Guenter, maybe you can find
> > another solution/fix for it.
> >
> > Calling i8k_get_temp(3) on my laptop without
> > I8K_TEMPERATURE_BUG always returns value 193 (which is
> > above I8K_MAX_TEMP).
> >
> > When I8K_TEMPERATURE_BUG is enabled (by default) then
> > i8k_get_temp(3) returns value from prev[3] and store new
> > value I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3] is
> > initialized to 0.
> >
> > What I want to achieve is: when i8k_get_temp() for
> > particular sensor id always returns invalid value (>
> > I8K_MAX_TEMP) then we should totally ignore sensor with
> > that id and do not export it via hwmon.
> >
> > My solution is: initialize prev[id] to I8K_MAX_TEMP, so on
> > invalid data first call to i8k_get_temp(id) returns
> > I8K_MAX_TEMP. Then in i8k_init_hwmon check if value is <
> > I8K_MAX_TEMP and if not ignore sensor id.
> >
> > Guenter, it is clear now? Are you ok that we should ignore
> > sensor if always report value above I8K_MAX_TEMP? If you do
> > not like my solution/patch for it, can you specify how
> > other can it be fixed?
>
> I still don't see the point in initializing prev[].
>

Now prev[] is initialized to 0. It means that first call
i8k_get_temp() (with sensor id which return value > I8K_MAX_TEMP)
returns 0. Second and other calls returns I8K_MAX_TEMP.

So point is to return same value for first and other calls.

> Yes, I am ok with ignoring sensor values if the reported
> temperature is above I8K_MAX_TEMP. I am just not sure if we
> should check against I8K_MAX_TEMP or against, say, 192.
> Reason is that we do know that the sensor can erroneously
> return 0x99 on some systems once in a while. We would not
> want to ignore those sensors just because they happen to
> report 0x99 during initialization.
>
> So maybe make it
> if (err >= 0 && err < 192)
> and add a note before the first if(), explaining that higher
> values suggest that there is no sensor attached.
>
> Thanks,
> Guenter
>

Right, now we need to decide which magic constant to use...

And now I found another problem :-)

On my laptop i8k_get_temp(3) not always return value 193. It is
only when AMD graphics card is turned off. When card is on
i8k_get_temp(3) returns same value as temperature hwmon part from
radeon DRM driver.

So it looks like that on my laptop i8k sensor with id 3 reports
GPU temperature.

When card is turned off radeon driver reports -EINVAL for
temperature hwmon sysnode.

So now I think i8k could not ignore sensor totally as it can be
mapped to some HW which can be dynamically turned on/off (like my
graphics card).

So what do you think about reporting -EINVAL instead I8K_MAX_TEMP
when dell SMM returns value above I8K_MAX_TEMP?

--
Pali RohÃr
pali.rohar@xxxxxxxxx

Attachment: signature.asc
Description: This is a digitally signed message part.