Re: [Freedreno] [PATCH v2 0/2] drm: fdinfo memory stats

From: Dmitry Baryshkov
Date: Tue Apr 11 2023 - 18:27:14 EST


On 11/04/2023 21:26, Daniel Vetter wrote:
On Tue, Apr 11, 2023 at 08:35:48PM +0300, Dmitry Baryshkov wrote:
On Tue, 11 Apr 2023 at 20:13, Rob Clark <robdclark@xxxxxxxxx> wrote:

On Tue, Apr 11, 2023 at 9:53 AM Daniel Vetter <daniel@xxxxxxxx> wrote:

On Tue, Apr 11, 2023 at 09:47:32AM -0700, Rob Clark wrote:
On Mon, Apr 10, 2023 at 2:06 PM Rob Clark <robdclark@xxxxxxxxx> wrote:

From: Rob Clark <robdclark@xxxxxxxxxxxx>

Similar motivation to other similar recent attempt[1]. But with an
attempt to have some shared code for this. As well as documentation.

It is probably a bit UMA-centric, I guess devices with VRAM might want
some placement stats as well. But this seems like a reasonable start.

Basic gputop support: https://patchwork.freedesktop.org/series/116236/
And already nvtop support: https://github.com/Syllo/nvtop/pull/204

On a related topic, I'm wondering if it would make sense to report
some more global things (temp, freq, etc) via fdinfo? Some of this,
tools like nvtop could get by trawling sysfs or other driver specific
ways. But maybe it makes sense to have these sort of things reported
in a standardized way (even though they aren't really per-drm_file)

I think that's a bit much layering violation, we'd essentially have to
reinvent the hwmon sysfs uapi in fdinfo. Not really a business I want to
be in :-)

I guess this is true for temp (where there are thermal zones with
potentially multiple temp sensors.. but I'm still digging my way thru
the thermal_cooling_device stuff)

It is slightly ugly. All thermal zones and cooling devices are virtual
devices (so, even no connection to the particular tsens device). One
can either enumerate them by checking
/sys/class/thermal/thermal_zoneN/type or enumerate them through
/sys/class/hwmon. For cooling devices again the only enumeration is
through /sys/class/thermal/cooling_deviceN/type.

Probably it should be possible to push cooling devices and thermal
zones under corresponding providers. However I do not know if there is
a good way to correlate cooling device (ideally a part of GPU) to the
thermal_zone (which in our case is provided by tsens / temp_alarm
rather than GPU itself).

There's not even sysfs links to connect the pieces in both ways?

I missed them in the most obvious place:

/sys/class/thermal/thermal_zone1/cdev0 -> ../cooling_device0

So, there is a link from thermal zone to cooling device.


But what about freq? I think, esp for cases where some "fw thing" is
controlling the freq we end up needing to use gpu counters to measure
the freq.

For the freq it is slightly easier: /sys/class/devfreq/*, devices are
registered under proper parent (IOW, GPU). So one can read
/sys/class/devfreq/3d00000.gpu/cur_freq or
/sys/bus/platform/devices/3d00000.gpu/devfreq/3d00000.gpu/cur_freq.

However because of the components usage, there is no link from
/sys/class/drm/card0
(/sys/devices/platform/soc@0/ae00000.display-subsystem/ae01000.display-controller/drm/card0)
to /sys/devices/platform/soc@0/3d00000.gpu, the GPU unit.

Hm ... do we need to make component more visible in sysfs, with _looooots_
of links? Atm it's just not even there.

Maybe. Or maybe we should use DPU (the component master and a parent of drm/card0) as devfreq parent too.


Getting all these items together in a platform-independent way would
be definitely an important but complex topic.

Yeah this sounds like some work. But also sounds like it's all generic
issues (thermal zones above and component here) that really should be
fixed at that level?

Cheers, Daniel


What might be needed is better glue to go from the fd or fdinfo to the
right hw device and then crawl around the hwmon in sysfs automatically. I
would not be surprised at all if we really suck on this, probably more
likely on SoC than pci gpus where at least everything should be under the
main pci sysfs device.

yeah, I *think* userspace would have to look at /proc/device-tree to
find the cooling device(s) associated with the gpu.. at least I don't
see a straightforward way to figure it out just for sysfs

BR,
-R

-Daniel


BR,
-R


[1] https://patchwork.freedesktop.org/series/112397/

Rob Clark (2):
drm: Add fdinfo memory stats
drm/msm: Add memory stats to fdinfo

Documentation/gpu/drm-usage-stats.rst | 21 +++++++
drivers/gpu/drm/drm_file.c | 79 +++++++++++++++++++++++++++
drivers/gpu/drm/msm/msm_drv.c | 25 ++++++++-
drivers/gpu/drm/msm/msm_gpu.c | 2 -
include/drm/drm_file.h | 10 ++++
5 files changed, 134 insertions(+), 3 deletions(-)

--
2.39.2


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch



--
With best wishes
Dmitry


--
With best wishes
Dmitry