Re: [PATCH v3 09/15] habanalabs: add sysfs and hwmon support

From: Oded Gabbay
Date: Fri Feb 08 2019 - 16:19:09 EST


On Fri, Feb 8, 2019 at 2:14 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Feb 04, 2019 at 10:32:48PM +0200, Oded Gabbay wrote:
> > This patch add the sysfs and hwmon entries that are exposed by the driver.
> >
> > Goya has several sensors, from various categories such as temperature,
> > voltage, current, etc. The driver exposes those sensors in the standard
> > hwmon mechanism.
> >
> > In addition, the driver exposes a couple of interfaces in sysfs, both for
> > configuration and for providing status of the device or driver.
> >
> > The configuration attributes is for Power Management:
> > - Automatic or manual
> > - Frequency value when moving to high frequency mode
> > - Maximum power the device is allowed to consume
> >
> > The rest of the attributes are read-only and provide the following
> > information:
> > - Versions of the various firmwares running on the device
> > - Contents of the device's EEPROM
> > - The device type (currently only Goya is supported)
> > - PCI address of the device (to allow user-space to connect between
> > /dev/hlX to PCI address)
> > - Status of the device (operational, malfunction, in_reset)
> > - How many processes are open on the device's file
> >
> > Signed-off-by: Oded Gabbay <oded.gabbay@xxxxxxxxx>
> > ---
> > .../ABI/testing/sysfs-driver-habanalabs | 190 ++++++
> > drivers/misc/habanalabs/Makefile | 2 +-
> > drivers/misc/habanalabs/device.c | 146 +++++
> > drivers/misc/habanalabs/goya/Makefile | 2 +-
> > drivers/misc/habanalabs/goya/goya.c | 230 +++++++
> > drivers/misc/habanalabs/goya/goyaP.h | 21 +
> > drivers/misc/habanalabs/goya/goya_hwmgr.c | 306 +++++++++
> > drivers/misc/habanalabs/habanalabs.h | 101 +++
> > drivers/misc/habanalabs/habanalabs_drv.c | 7 +
> > drivers/misc/habanalabs/hwmon.c | 449 +++++++++++++
> > drivers/misc/habanalabs/sysfs.c | 589 ++++++++++++++++++
> > 11 files changed, 2041 insertions(+), 2 deletions(-)
> > create mode 100644 Documentation/ABI/testing/sysfs-driver-habanalabs
> > create mode 100644 drivers/misc/habanalabs/goya/goya_hwmgr.c
> > create mode 100644 drivers/misc/habanalabs/hwmon.c
> > create mode 100644 drivers/misc/habanalabs/sysfs.c
> >
> > diff --git a/Documentation/ABI/testing/sysfs-driver-habanalabs b/Documentation/ABI/testing/sysfs-driver-habanalabs
> > new file mode 100644
> > index 000000000000..19edd4da87c1
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-driver-habanalabs
> > @@ -0,0 +1,190 @@
> > +What: /sys/class/habanalabs/hl<n>/armcp_kernel_ver
> > +Date: Jan 2019
> > +KernelVersion: 5.1
> > +Contact: oded.gabbay@xxxxxxxxx
> > +Description: Version of the Linux kernel running on the device's CPU
>
> Hey, nice! We can see how old that kernel gets over time :)
>
> > +What: /sys/class/habanalabs/hl<n>/soft_reset_cnt
> > +Date: Jan 2019
> > +KernelVersion: 5.1
> > +Contact: oded.gabbay@xxxxxxxxx
> > +Description: Displays how many times the device have undergone a soft-reset
> > + operation
>
> "how many times" since when? Power on? Kernel boot? Driver load?
Thanks, fixed that (Driver load).
>
> > +static ssize_t mme_clk_show(struct device *dev, struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct hl_device *hdev = dev_get_drvdata(dev);
> > + long value;
> > +
> > + if (hdev->disabled)
> > + return -ENODEV;
> > +
> > + value = hl_get_frequency(hdev, MME_PLL, false);
> > +
> > + if (value < 0)
> > + return value;
> > +
> > + return snprintf(buf, PAGE_SIZE, "%lu\n", value);
>
> Meta-comment. You do this in all of your sysfs show functions, and I
> understand the "I want to be safe!" feeling here, but you should just
> use sprintf(). The size of a sysfs buffer is PAGE_SIZE, and you should
> never even get close to it if you are only writing a single numeric
> value.
>
> So no need to do anything fancy, just use sprintf() please.
Got it, fixed.
>
> > +static DEVICE_ATTR_RW(mme_clk);
> > +static DEVICE_ATTR_RW(tpc_clk);
> > +static DEVICE_ATTR_RW(ic_clk);
> > +static DEVICE_ATTR_RO(mme_clk_curr);
> > +static DEVICE_ATTR_RO(tpc_clk_curr);
> > +static DEVICE_ATTR_RO(ic_clk_curr);
>
> Some people like to put the macro right under the show/store functions,
> to keep them all in one place. Makes it easier to change/add things
> later over time.
I really prefer it this way, you get a clear picture of all the ATTR
you expose without the need to go over the entire file.
If you don't mind I want to leave this as it is.

>
> > +int goya_add_device_attr(struct hl_device *hdev)
> > +{
> > + int rc;
> > +
> > + rc = device_create_file(hdev->dev, &dev_attr_mme_clk);
> > + if (rc) {
> > + dev_err(hdev->dev, "failed to create device file mme_clk\n");
> > + return rc;
> > + }
> > +
> > + rc = device_create_file(hdev->dev, &dev_attr_tpc_clk);
> > + if (rc) {
> > + dev_err(hdev->dev, "failed to create device file tpc_clk\n");
> > + goto remove_mme_clk;
> > + }
> > +
> > + rc = device_create_file(hdev->dev, &dev_attr_ic_clk);
> > + if (rc) {
> > + dev_err(hdev->dev, "failed to create device file ic_clk\n");
> > + goto remove_tpc_clk;
> > + }
> > +
> > + rc = device_create_file(hdev->dev, &dev_attr_mme_clk_curr);
> > + if (rc) {
> > + dev_err(hdev->dev,
> > + "failed to create device file mme_clk_curr\n");
> > + goto remove_ic_clk;
> > + }
> > +
> > + rc = device_create_file(hdev->dev, &dev_attr_tpc_clk_curr);
> > + if (rc) {
> > + dev_err(hdev->dev,
> > + "failed to create device file tpc_clk_curr\n");
> > + goto remove_mme_clk_curr;
> > + }
> > +
> > + rc = device_create_file(hdev->dev, &dev_attr_ic_clk_curr);
> > + if (rc) {
> > + dev_err(hdev->dev,
> > + "failed to create device file ic_clk_curr\n");
> > + goto remove_tpc_clk_curr;
> > + }
>
>
> You do know about attribute groups, right? Please use them so you don't
> have to hand-roll the "add a bunch of files" logic here and elsewhere.
> The driver core will just automatically add and remove all of the files
> when the device is bound properly without you having to do anything.
> That saves you a lot of code and debugging logic.
>
> Same for your other lists of attributes, they can all be handled
> automatically, no need for this type of logic.
ah... hmm... oops, didn't know about them. feelsbadman :(
I don't think they are used in drm (or at least not in amd) which is
where most of my kernel knowledge comes from.
Indeed, much more elegant. Fixed the code.

>
> thanks,
>
> greg k-h