Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives

From: Guenter Roeck
Date: Sun Jan 12 2020 - 15:08:24 EST


On 1/12/20 10:37 AM, Gabriel C wrote:
Am So., 12. Jan. 2020 um 16:26 Uhr schrieb Guenter Roeck <linux@xxxxxxxxxxxx>:

On 1/12/20 5:45 AM, Gabriel C wrote:
Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@xxxxxxxxxxxx>:

On 1/12/20 4:07 AM, Linus Walleij wrote:
On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@xxxxxxxxx> wrote:
Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij
<linus.walleij@xxxxxxxxxx>:

On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@xxxxxxxxx> wrote:

What I've noticed however is the nvme temperature low/high values on
the Sensors X are strange here.
(...)
Sensor 1: +27.9ÂC (low = -273.1ÂC, high = +65261.8ÂC)
Sensor 2: +29.9ÂC (low = -273.1ÂC, high = +65261.8ÂC)
(...)
Sensor 1: +23.9ÂC (low = -273.1ÂC, high = +65261.8ÂC)
Sensor 2: +25.9ÂC (low = -273.1ÂC, high = +65261.8ÂC)

That doesn't look strange to me. It seems like reasonable defaults
from the firmware if either it doesn't really log the min/max temperatures
or hasn't been through a cycle of updating these yet. Just set both
to absolute min/max temperatures possible.

Ok I'll check that.

Do you mean by setting the temperatures to use a lmsensors config?
Or is there a way to set these with a nvme command?

Not that I know of.

The min/max are the minumum and maximum temperatures the
device has experienced during this power-on cycle.


No, that would be lowest/highest. The above are (or should be) per-sensor
setpoints. The default for those is typically the absolute minimum /
maximum of the supported range.

Some SATA drives report the lowest/highest temperatures experienced
since power cycle, like here.

drivetemp-scsi-5-0
Adapter: SCSI adapter
temp1: +23.0ÂC (low = +0.0ÂC, high = +60.0ÂC)
(crit low = -41.0ÂC, crit = +85.0ÂC)
(lowest = +20.0ÂC, highest = +31.0ÂC)


The SATA temperatures are fine and reported like this here too, just
the nvme ones are strange.

drivetemp-scsi-4-0
Adapter: SCSI adapter
temp1: +28.0ÂC (low = +1.0ÂC, high = +61.0ÂC)
(crit low = +2.0ÂC, crit = +60.0ÂC)
(lowest = +16.0ÂC, highest = +31.0ÂC)

drivetemp-scsi-12-0
Adapter: SCSI adapter
temp1: +29.0ÂC (low = +1.0ÂC, high = +61.0ÂC)
(crit low = +2.0ÂC, crit = +60.0ÂC)
(lowest = +18.0ÂC, highest = +32.0ÂC)

and so on.

Btw, where I can find the code does these calculations?


Not sure if that is what you are looking for, but the nvme hardware
monitoring driver is at drivers/nvme/host/hwmon.c, the SATA hardware
monitoring driver is at drivers/hwmon/drivetemp.c.


I have a look thanks.

I'm using your v2 patch for the nvme part since you posted it on 5.4 kernels.
This is probably why I find the way the temperatures are now reported
very strange.

The ADATA XPG SX8200 Pro in my laptop seems to work better:

nvme-pci-0200
Adapter: PCI adapter
Composite: +37.9ÂC (low = -0.1ÂC, high = +74.8ÂC)
(crit = +79.8ÂC)

Low is 0Â which is what the spec suggests.

The limits on nvme drives are configurable.

Yes, I found this out already.

root@server:/sys/class/hwmon# sensors nvme-pci-0100
nvme-pci-0100
Adapter: PCI adapter
Composite: +40.9ÂC (low = -273.1ÂC, high = +84.8ÂC)
(crit = +84.8ÂC)
Sensor 1: +40.9ÂC (low = -273.1ÂC, high = +65261.8ÂC)
Sensor 2: +43.9ÂC (low = -273.1ÂC, high = +65261.8ÂC)

root@server:/sys/class/hwmon# echo 0 > hwmon1/temp2_min
root@server:/sys/class/hwmon# echo 100000 > hwmon1/temp2_max

An lm-sensors configuration will work too.

Sure, the above was just an example.

root@server:/sys/class/hwmon# sensors nvme-pci-0100
nvme-pci-0100
Adapter: PCI adapter
Composite: +38.9ÂC (low = -273.1ÂC, high = +84.8ÂC)
(crit = +84.8ÂC)
Sensor 1: +38.9ÂC (low = -0.1ÂC, high = +99.8ÂC)
Sensor 2: +42.9ÂC (low = -273.1ÂC, high = +65261.8ÂC)

If you dislike the defaults, just configure whatever you think is
appropriate for your system.

It's not about disliking the values. I want to find out if these Samsung models
don't support that, or it is a bug somewhere in writing/calculating the values.

No, this is not a bug. It is perfectly valid for individual sensors to have
uninitialized limits. If I recall correctly, the NVME specification
specifically states that the default settings for individual sensors
shall be those values (0 and 65535 Kelvin, specifically).

And, yes, I would agree that is a bit odd that NVME drives report temperatures
in Kelvin, but such is the world.

In the case, Samsung and others don't support such a thing wouldn't be
better to just ignore
the bogus reading altogether?

Again, you can set whatever limits you like. The default limits on many
hardware sensor chips have odd values. Just looking at my system:

nct6797-isa-0a20
Adapter: ISA adapter
in0: +0.48 V (min = +0.00 V, max = +1.74 V)
in1: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM
in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in3: +3.31 V (min = +0.00 V, max = +0.00 V) ALARM
in4: +1.00 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.82 V (min = +0.00 V, max = +0.00 V) ALARM
in7: +3.38 V (min = +0.00 V, max = +0.00 V) ALARM
in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM
in9: +1.82 V (min = +0.00 V, max = +0.00 V) ALARM
in10: +0.00 V (min = +0.00 V, max = +0.00 V)
in11: +0.74 V (min = +0.00 V, max = +0.00 V) ALARM
in12: +1.20 V (min = +0.00 V, max = +0.00 V) ALARM
in13: +0.68 V (min = +0.00 V, max = +0.00 V) ALARM
in14: +1.50 V (min = +0.00 V, max = +0.00 V) ALARM


Are you suggesting that we should not support setting min/max values for
all drivers just because they are often not initialized to reasonable values
by default ?

Guenter