[RFC 0/8] drm: explicit fencing support

From: Gustavo Padovan
Date: Thu Apr 14 2016 - 21:29:50 EST


From: Gustavo Padovan <gustavo.padovan@xxxxxxxxxxxxxxx>

Hi,

Currently the Linux Kernel only have an implicit fencing mechanism
where the fence are attached directly to buffers and userspace is unaware of
what is happening. On the other hand explicit fencing which is not supported
yet by Linux but it expose fences to the userspace to handle fencing between
producer/consumer explicitely.

For that we use the Android Sync Framework[1], a explicit fencing mechanism
that help the userspace handles fences directly. It has the concept of
sync_file (called sync_fence in Android) that expose the driver's fences to
userspace via file descriptors. File descriptors are useful because we can pass
them around between process.

The Sync Framework is currently in the staging tree and on the process to
be de-staged[2].

With explicit fencing we have a global mechanism that optimizes the flow of
buffers between consumers and producers, avoid a lot of waiting. So instead
of waiting for a buffer to be processed by the GPU before sending it to DRM
in an Atomic IOCTL we can get a sync_file fd from the GPU driver at the moment
we submit the buffer processing. The compositor then passes these fds to DRM in
a atomic commit request, that will not be displayed until the fences signal,
i.e, the GPU finished processing the buffer and it is ready to display. In DRM
the fences we wait on before displaying a buffer are called in-fences.

Vice-versa, we have out-fences, to sychronize the return of buffers to GPU
(producer) to be processed again. When DRM receives an atomic request with a
special flag set it generates one fence per-crtc and attach it to a per-crtc
sync_file. It then returns the array of sync_file fds to userspace as an
atomic_ioctl out arg. With the fences available userspace can forward these
fences to the GPU, where it will wait the fence to signal before starting to
process on buffer again.

Explicit fencing with Sync Framework allows buffer suballocation. Userspace
get a large buffer and divides it into small ones and submit requests to
process them, each subbuffer gets and sync_file fd and can be processed in
parallel. This is not even possible with implicit fencing.

While these are out-fences in DRM (the consumer) they become in-fences once
they get to the GPU (the producer).

DRM explicit fences are opt-in, as the default will still be implicit fencing.
To enable explicit in-fences one just need to pass a sync_file fd in the
FENCE_FD plane property. *In-fences are per-plane*, i.e., per framebuffer.

For out-fences, just enabling DRM_MODE_ATOMIC_OUT_FENCE flag is enough.
*Out-fences are per-crtc*.

In-fences
---------

In the first discussions on #dri-devel on IRC we decided to hide the Sync
Framework from DRM drivers to reduce complexity, so as soon we get the fd
via FENCE_FD plane property we convert the sync_file fd to a struct fence.
However a sync_file might contain more than one fence, so we created the
fence_collection concept. struct fence_collection is a subclass of struct
fence and stores a group of fences that needs to be waited together, in
other words, all the fences in the sync_file.

Then we just use the already in place fence support to wait on those fences.
Once the producer calls fence_signal() for all fences on wait we can proceed
with the atomic commit and display the framebuffers. DRM drivers only needs to
be converted to struct fence to make use of this feature.

Out-fences
----------

Passing the DRM_MODE_ATOMIC_OUT_FENCE flag to an atomic request enables
out-fences. The kernel then creates a fence, attach it to a sync_file and
install this file on a unused fd for each crtc. Userspace get the fence back
as an array of per-crtc sync_file fds.

DRM core use the already in place drm_event infrastructure to help signal
fences, we've added a fence pointer to struct drm_pending_event. If the atomic
update received requested an PAGE_FLIP_EVENT we just use the same
drm_pending_event and set our fence there, otherwise we just create an event
with a NULL file_priv to set our fence. On vblank we just call fence_signal()
to signal that the buffer related to this fence is *now* on the screen.
Note that this is exactly the opposite behaviour from Android, where the fences
are signaled when they are not on display anymore, so free to be reused.

No changes are required to DRM drivers to have out-fences support, apart from
atomic support of course.

Open question
--------------

Should we use sync_timeline for out-fences? My feel is that sync_timeline
do a lot more than we need for DRM. Should we go for a small drm_timeline?
Any toughts on this?


Kernel tree
-----------

For those who want all patches on this RFC are in my tree. The tree includes
all sync frameworks patches needed at the moment:

https://git.kernel.org/cgit/linux/kernel/git/padovan/linux.git/log/?h=fences

I also hacked some poor some fake fences support to modetest here:

https://git.collabora.com/cgit/user/padovan/libdrm.git/log/?h=atomic


Regards,

Gustavo
---

[1] https://source.android.com/devices/graphics/implement.html#vsync
[2] https://git.kernel.org/cgit/linux/kernel/git/padovan/linux.git/log/?h=sync

Gustavo Padovan (8):
dma-buf/fence: add fence_collection fences
dma-buf/sync_file: add sync_file_fences_get()
drm/fence: allow fence waiting to be interrupted by userspace
drm/fence: add in-fences support
drm/fence: add fence to drm_pending_event
drm/fence: create DRM_MODE_ATOMIC_OUT_FENCE flag
drm/fence: create per-crtc sync_timeline
drm/fence: add out-fences support

drivers/dma-buf/Makefile | 2 +-
drivers/dma-buf/fence-collection.c | 138 ++++++++++++++++++++++++++++++++++++
drivers/dma-buf/fence.c | 2 +-
drivers/dma-buf/sync_file.c | 37 ++++++++++
drivers/gpu/drm/Kconfig | 1 +
drivers/gpu/drm/drm_atomic.c | 136 +++++++++++++++++++++++++++++++++--
drivers/gpu/drm/drm_atomic_helper.c | 8 ++-
drivers/gpu/drm/drm_crtc.c | 16 +++++
drivers/gpu/drm/drm_fops.c | 5 +-
drivers/gpu/drm/drm_irq.c | 7 ++
include/drm/drmP.h | 1 +
include/drm/drm_crtc.h | 8 +++
include/linux/fence-collection.h | 56 +++++++++++++++
include/linux/fence.h | 2 +
include/linux/sync_file.h | 10 +++
include/uapi/drm/drm_mode.h | 11 ++-
16 files changed, 427 insertions(+), 13 deletions(-)
create mode 100644 drivers/dma-buf/fence-collection.c
create mode 100644 include/linux/fence-collection.h

--
2.5.5