Re: [PATCH] i8k: Ignore temperature sensors which report invalid values

From: Guenter Roeck
Date: Wed Oct 22 2014 - 12:20:01 EST


On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár wrote:
> On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > Ok, I will describe my problem. Guenter, maybe you can find
> > > another solution/fix for it.
> > >
> > > Calling i8k_get_temp(3) on my laptop without
> > > I8K_TEMPERATURE_BUG always returns value 193 (which is
> > > above I8K_MAX_TEMP).
> > >
> > > When I8K_TEMPERATURE_BUG is enabled (by default) then
> > > i8k_get_temp(3) returns value from prev[3] and store new
> > > value I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3] is
> > > initialized to 0.
> > >
> > > What I want to achieve is: when i8k_get_temp() for
> > > particular sensor id always returns invalid value (>
> > > I8K_MAX_TEMP) then we should totally ignore sensor with
> > > that id and do not export it via hwmon.
> > >
> > > My solution is: initialize prev[id] to I8K_MAX_TEMP, so on
> > > invalid data first call to i8k_get_temp(id) returns
> > > I8K_MAX_TEMP. Then in i8k_init_hwmon check if value is <
> > > I8K_MAX_TEMP and if not ignore sensor id.
> > >
> > > Guenter, it is clear now? Are you ok that we should ignore
> > > sensor if always report value above I8K_MAX_TEMP? If you do
> > > not like my solution/patch for it, can you specify how
> > > other can it be fixed?
> >
> > I still don't see the point in initializing prev[].
> >
>
> Now prev[] is initialized to 0. It means that first call
> i8k_get_temp() (with sensor id which return value > I8K_MAX_TEMP)
> returns 0. Second and other calls returns I8K_MAX_TEMP.
>
> So point is to return same value for first and other calls.
>
Yes, I realized that after I sent my previous mail.

> > Yes, I am ok with ignoring sensor values if the reported
> > temperature is above I8K_MAX_TEMP. I am just not sure if we
> > should check against I8K_MAX_TEMP or against, say, 192.
> > Reason is that we do know that the sensor can erroneously
> > return 0x99 on some systems once in a while. We would not
> > want to ignore those sensors just because they happen to
> > report 0x99 during initialization.
> >
> > So maybe make it
> > if (err >= 0 && err < 192)
> > and add a note before the first if(), explaining that higher
> > values suggest that there is no sensor attached.
> >
> > Thanks,
> > Guenter
> >
>
> Right, now we need to decide which magic constant to use...
>
> And now I found another problem :-)
>
> On my laptop i8k_get_temp(3) not always return value 193. It is
> only when AMD graphics card is turned off. When card is on
> i8k_get_temp(3) returns same value as temperature hwmon part from
> radeon DRM driver.
>
Can you turn the GPU on or off during runtime ?
That would make it really tricky to handle the situation.

> So it looks like that on my laptop i8k sensor with id 3 reports
> GPU temperature.
>
> When card is turned off radeon driver reports -EINVAL for
> temperature hwmon sysnode.
>
> So now I think i8k could not ignore sensor totally as it can be
> mapped to some HW which can be dynamically turned on/off (like my
> graphics card).
>
> So what do you think about reporting -EINVAL instead I8K_MAX_TEMP
> when dell SMM returns value above I8K_MAX_TEMP?
>

-EINVAL is supposed to mean "Invalid Argument", so it really has
ia different semantics. We could use -ENXIO, "No such device or address",
which seems more appropriate.

Overall, I think the entire error handling is broken and should be
replaced. One option would be to explicitly check for 0x99 and, if
detected, go to sleep for, say, 100ms and try again. If it still fails,
and for all other bad values, return -ENXIO. Then the calling code can
either return the error to user space in the show function, or not
install the sensor in the probe function.

Does that make sense ?

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/