Introduce fences for N:M completion variables

From: Chris Wilson
Date: Fri Jun 24 2016 - 05:09:48 EST


struct completion allows for multiple waiters on a single event.
However, frequently we want to wait on multiple events. For example in
job processing, we need to wait for all prerequisite tasks to complete
before proceeding. Such dependency tracking is common to many situations.
In dma-buf, we already have a mechanism in place for tracking
dependencies between tasks and across drivers, the fence. Each fence is
a fixed point on a timeline that the hardware is processing (though the
hardware may be executing from multiple timelines concurrently). Each
fence may wait on any other fence (and for native fences the wait may be
executed on the device, but otherwise the signaling and forward progress
of the inter-fence serialisation is provided by the drivers themselves).
The added complexity of hardware interaction makes the dma-buf fence
unwieldy as a drop-in extension of struct completion. Enter kfence.

The kfence is intended to be as easy to use as a struct completion in
order to provide barriers in a DAG of tasks. It can provide
serialisation with other software events just as easily as it can mix in
dma-fences and be used to construct an event-driven state machine.

The tasks I have applied kfence to are:

* providing fine-grained dependency and concurrent execution for the
global initcalls. Drivers are currently creatively using the fixed
initcall phases to solve dependency problems. Knowing which initcall
can be executed in parallel helps speed up the boot process. Though
not as much as removing the barrier after initramfs!

* providing fine-grained dependency and concurrent execution for
load/resume within a module (within the overall global async
execution). Trying to parallelise a driver between discovery and
hardware setup is hard to retrofit and will be challenging to
maintain without a mechanism by which we can describe the dependencies
of each phase upon each other (and hw state) and then let the
hardware resolve the order in which to execute the phases. We want a
declarative syntax?

* providing asynchronous execution of GPU rendering (for a mix of
inter-device rendering and inter-engine without hardware scheduling).
This mixes dma-fences with an event-driven state machine. Here, the
kfence primarily serves as a collection of dma-fences.

* providing asynchronous execution of atomic modesetting,
mixing the current usage of struct completion with dma-fences into
one consistent framework