Re: [PATCH] nvme: Add hardware monitoring support

From: Guenter Roeck
Date: Mon Oct 28 2019 - 09:27:49 EST


On 10/28/19 1:08 AM, Keith Busch wrote:
On Mon, Oct 28, 2019 at 08:39:53AM +0100, Christoph Hellwig wrote:
On Sun, Oct 27, 2019 at 07:41:56PM -0700, Guenter Roeck wrote:
nvme devices report temperature information in the controller information
(for limits) and in the smart log. Currently, the only means to retrieve
this information is the nvme command line interface, which requires
super-user privileges.

At the same time, it would be desirable to use NVME temperature information
for thermal control.

This patch adds support to read NVME temperatures from the kernel using the
hwmon API and adds temperature zones for NVME drives. The thermal subsystem
can use this information to set thermal policies, and userspace can access
it using libsensors and/or the "sensors" command.

So these reported values seem to generate some interest. Adding Akinobu
Mita who also planned to wire them up to the thermal framework. I don't
really know either upper layer so I'm not sure which is the right one,
but with this just like with the previous series I am quite worried that
we add a lot of kernel boilerplate code for information people can
trivially get using nvme-cli.
> I think it's nvme-cli requires root, where this conveniently doesn't
need those elevated rights.


The other point here is the thermal framework. One can not wire that up
through userspace, and even if it was possible to do it, that would defeat
the idea of having the thermal subsystem in the kernel running on its own,
without requiring userspace attention.

I'm not familiar with either upper level framework either; my only review
comment for this patch is to use devm_kfree() for the error cases.

Makes sense. I'll address that in v2.

Guenter