Re: [PATCH v4 03/17] drm/imagination/uapi: Add PowerVR driver UAPI

From: Frank Binns
Date: Tue Aug 15 2023 - 08:07:57 EST


Hi Michel,

Thank you for the feedback (comments below).

On Tue, 2023-08-08 at 15:46 +0200, Michel Dänzer wrote:
> On 7/14/23 16:25, Sarah Walker wrote:
> > +/**
> > + * DOC: PowerVR IOCTL CREATE_BO interface
> > + */
> > +
> > +/**
> > + * DOC: Flags for CREATE_BO
> > + *
> > + * The &struct drm_pvr_ioctl_create_bo_args.flags field is 64 bits wide and consists
> > + * of three groups of flags: creation, device mapping and CPU mapping.
> > + *
> > + * We use "device" to refer to the GPU here because of the ambiguity between
> > + * CPU and GPU in some fonts.
> > + *
> > + * Creation options
> > + * These use the prefix ``DRM_PVR_BO_CREATE_``.
> > + *
> > + * :ZEROED: Require the allocated buffer to be zeroed before returning. Note
> > + * that this is an active operation, and is never zero cost. Unless it is
> > + * explicitly required, this option should not be set.
>
> Making this optional is kind of problematic from a security standpoint (information leak, at least if the memory was previously used by a different process). See e.g. the discussion starting at https://gitlab.freedesktop.org/mesa/mesa/-/issues/9189#note_1972986 .
>
> AFAICT the approach I suggested there (Clear freed memory in the background, and make it available for allocation again only once the clear has finished) isn't really possible with gem_shmem in its current state though. There seems to be ongoing work to do something like that for __GFP_ZERO in general, maybe gem_shmem could take advantage of that when it lands. I'm afraid this series can't depend on that though.
>

Good point! We'll remove this flag and always zero the memory at allocation
time. We can then look to do something better in the future once the driver is
merged.

>
> > +/**
> > + * DOC: PowerVR IOCTL VM_MAP and VM_UNMAP interfaces
> > + *
> > + * The VM UAPI allows userspace to create buffer object mappings in GPU virtual address space.
> > + *
> > + * The client is responsible for managing GPU address space. It should allocate mappings within
> > + * the heaps returned by %DRM_PVR_DEV_QUERY_HEAP_INFO_GET.
> > + *
> > + * %DRM_IOCTL_PVR_VM_MAP creates a new mapping. The client provides the target virtual address for
> > + * the mapping. Size and offset within the mapped buffer object can be specified, so the client can
> > + * partially map a buffer.
> > + *
> > + * %DRM_IOCTL_PVR_VM_UNMAP removes a mapping. The entire mapping will be removed from GPU address
> > + * space. For this reason only the start address is provided by the client.
> > + */
>
> FWIW, the amdgpu driver uses a single ioctl for VM map & unmap (plus two additional operations for partial residency). Maybe this would make sense for the PowerVR driver as well, in particular if it might support partial residency in the future.
>
> (amdgpu also uses similar multiplexer ioctls for other things such as context create/destroy/...)
>
> Just an idea, feel free to ignore.
>

We originally had a single ioctl for VM map/unmap, which was inspired by amdgpu,
but we were told to split this into separate ioctls:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15243#note_1283760

>
> > +/**
> > + * DOC: Flags for SUBMIT_JOB ioctl geometry command.
> > + *
> > + * .. c:macro:: DRM_PVR_SUBMIT_JOB_GEOM_CMD_FIRST
> > + *
> > + * Indicates if this the first command to be issued for a render.
> > + *
> > + * .. c:macro:: DRM_PVR_SUBMIT_JOB_GEOM_CMD_LAST
>
> Does user space really need to pass in the FIRST/LAST flags, can't the kernel driver determine this implicitly? What happens if user space sets these incorrectly?
>

I don't think there's a way for the kernel driver to determine when the LAST
flag should be set. The obvious time to do this would be when it first
encounters a fragment job and there are geometry jobs proceeding it. However,
there is a least one case (transform feedback) when we want to submit a geometry
job without a fragment job and we need to set the LAST flag. I can't think of an
obvious way to detect this in the kernel driver.

If the flags aren't set correctly then it can cause the hardware to lockup.

>
> > + * .. c:macro:: DRM_PVR_SUBMIT_JOB_FRAG_CMD_PREVENT_CDM_OVERLAP
> > + *
> > + * Disallow compute overlapped with this render.
>
> Does this affect only compute from the same context, or also from other contexts?
>

Fragment and compute jobs are submitted on separate contexts. This flag is set
on fragment jobs and will prevent the job in question overlapping with compute
jobs across all compute contexts. This is necessary in some cases to avoid
hardware lockups on those GPUs that support compute overlapping with other job
types.

> (Similar question for DRM_PVR_SUBMIT_JOB_COMPUTE_CMD_PREVENT_ALL_OVERLAP)
>

Likewise, this will prevent compute jobs with this flag set from overlapping
with all other jobs across all contexts. Again, this is to avoid hardware
lockups on GPUs supporting compute overlap.

>
> P.S. I mostly just skimmed the other patches of the series, but my impression is that the patches and code are cleanly structured and well-documented.
>

Thank you :)

Frank