RE: [PATCH] hwmon: (k10temp) Report negative temperatures

From: Kannan, Baski
Date: Thu Jun 08 2023 - 13:09:31 EST


[AMD Official Use Only - General]

The patch you have mentioned, aef17ca12719, sounds like a work-around for a problem found in some Ryzen Threadripper processors.
If I understand correctly, this work-around (aef17ca12719) has been provided as a blanket fix for all the processors.

The Industrial Processor in question is the Epyc3k i3255.
AMD Family 17h (boot_cpu_data.x86)
AMD model 00h - 0fh (boot_cpu_data.x86_model)
Model Name - contains string "3255"

It supports temperature ranging from -40 degree Celsius to 105 deg Celsius.
We have customers' machines running at -20 deg Celsius. They require that the correct temperature be passed to their tools.

-----Original Message-----
From: Guenter Roeck <groeck7@xxxxxxxxx> On Behalf Of Guenter Roeck
Sent: Thursday, June 8, 2023 8:52 AM
To: Kannan, Baski <Baski.Kannan@xxxxxxx>
Cc: Moger, Babu <Babu.Moger@xxxxxxx>; clemens@xxxxxxxxxx; jdelvare@xxxxxxxx; linux-hwmon@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [PATCH] hwmon: (k10temp) Report negative temperatures

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Tue, May 23, 2023 at 02:46:46PM -0700, Guenter Roeck wrote:
> On Tue, May 23, 2023 at 03:49:32PM -0500, Baskaran Kannan wrote:
> > Currently, the tctl and die temperatures are rounded off to zero if
> > they are less than 0. There are industrial processors which work
> > below zero.
>
> This was introduced with commit aef17ca12719 ("hwmon: (k10temp) Only
> apply temperature offset if result is positive"). This patch would
> effecively revert that change. Given the reason for introducing it I
> am not convinced that it is a good idea to unconditionally revert it.
>

Any comments ? I am not inclined to accept this patch as-is. What are the industrial processors ? Is there a means to detect them ?

Guenter

> Guenter
>
> >
> > To display the correct temperature remove the rounding off.
> >
> > Signed-off-by: Baskaran Kannan <Baski.Kannan@xxxxxxx>
> > ---
> > drivers/hwmon/k10temp.c | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index
> > 7b177b9fbb09..489ad0b1bc74 100644
> > --- a/drivers/hwmon/k10temp.c
> > +++ b/drivers/hwmon/k10temp.c
> > @@ -204,13 +204,9 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
> > switch (channel) {
> > case 0: /* Tctl */
> > *val = get_raw_temp(data);
> > - if (*val < 0)
> > - *val = 0;
> > break;
> > case 1: /* Tdie */
> > *val = get_raw_temp(data) - data->temp_offset;
> > - if (*val < 0)
> > - *val = 0;
> > break;
> > case 2 ... 13: /* Tccd{1-12} */
> > amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
> > --
> > 2.25.1
> >