Re: [RFC PATCH v3 0/3] new subsystem for compute accelerator devices

From: Jeffrey Hugo
Date: Mon Nov 07 2022 - 11:09:01 EST


On 11/6/2022 2:02 PM, Oded Gabbay wrote:
This is the third version of the RFC following the comments given on the
second version, but more importantly, following testing done by the VPU
driver people and myself. We found out that there is a circular dependency
between DRM and accel. DRM calls accel exported symbols during init and when
accel devices are registering (all the minor handling), then accel calls DRM
exported symbols. Therefore, if the two components are compiled as modules,
there is a circular dependency.

To overcome this, I have decided to compile the accel core code as part of
the DRM kernel module (drm.ko). IMO, this is inline with the spirit of the
design choice to have accel reuse the DRM core code and avoid code
duplication.

Another important change is that I have reverted back to use IDR for minor
handling instead of xarray. This is because I have found that xarray doesn't
handle well the scenario where you allocate a NULL entry and then exchange it
with a real pointer. It appears xarray still considers that entry a "zero"
entry. This is unfortunate because DRM works that way (first allocates a NULL
entry and then replaces the entry with a real pointer).

I decided to revert to IDR because I don't want to hold up these patches,
as many people are blocked until the support for accel is merged. The xarray
issue should be fixed as a separate patch by either fixing the xarray code or
changing how DRM + ACCEL do minor id handling.

This sounds sane to me. However, this appears to be something that Matthew Wilcox should be aware of (added for visibility). Perhaps he has a very quick solution. If not, at-least he might have ideas on how to best address in the future.

The patches are in the following repo:
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git/log/?h=accel_v3

As in v2, The HEAD of that branch is a commit adding a dummy driver that
registers an accel device using the new framework. This can be served
as a simple reference. I have checked inserting and removing the dummy driver,
and opening and closing /dev/accel/accel0 and nothing got broken :)

v1 cover letter:
https://lkml.org/lkml/2022/10/22/544

v2 cover letter:
https://lore.kernel.org/lkml/20221102203405.1797491-1-ogabbay@xxxxxxxxxx/T/

Thanks,
Oded.

Oded Gabbay (3):
drivers/accel: define kconfig and register a new major
accel: add dedicated minor for accelerator devices
drm: initialize accel framework

Documentation/admin-guide/devices.txt | 5 +
MAINTAINERS | 8 +
drivers/Kconfig | 2 +
drivers/accel/Kconfig | 24 ++
drivers/accel/drm_accel.c | 322 ++++++++++++++++++++++++++
drivers/gpu/drm/Makefile | 1 +
drivers/gpu/drm/drm_drv.c | 102 +++++---
drivers/gpu/drm/drm_file.c | 2 +-
drivers/gpu/drm/drm_sysfs.c | 24 +-
include/drm/drm_accel.h | 97 ++++++++
include/drm/drm_device.h | 3 +
include/drm/drm_drv.h | 8 +
include/drm/drm_file.h | 21 +-
13 files changed, 582 insertions(+), 37 deletions(-)
create mode 100644 drivers/accel/Kconfig
create mode 100644 drivers/accel/drm_accel.c
create mode 100644 include/drm/drm_accel.h

--
2.25.1