Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow control

From: Boris Brezillon
Date: Tue Sep 26 2023 - 03:11:42 EST


On Mon, 25 Sep 2023 19:55:21 +0200
Christian König <christian.koenig@xxxxxxx> wrote:

> Am 25.09.23 um 14:55 schrieb Boris Brezillon:
> > +The imagination team, who's probably interested too.
> >
> > On Mon, 25 Sep 2023 00:43:06 +0200
> > Danilo Krummrich <dakr@xxxxxxxxxx> wrote:
> >
> >> Currently, job flow control is implemented simply by limiting the amount
> >> of jobs in flight. Therefore, a scheduler is initialized with a
> >> submission limit that corresponds to a certain amount of jobs.
> >>
> >> This implies that for each job drivers need to account for the maximum
> >> job size possible in order to not overflow the ring buffer.
> >>
> >> However, there are drivers, such as Nouveau, where the job size has a
> >> rather large range. For such drivers it can easily happen that job
> >> submissions not even filling the ring by 1% can block subsequent
> >> submissions, which, in the worst case, can lead to the ring run dry.
> >>
> >> In order to overcome this issue, allow for tracking the actual job size
> >> instead of the amount job jobs. Therefore, add a field to track a job's
> >> submission units, which represents the amount of units a job contributes
> >> to the scheduler's submission limit.
> > As mentioned earlier, this might allow some simplifications in the
> > PowerVR driver where we do flow-control using a dma_fence returned
> > through ->prepare_job(). The only thing that'd be missing is a way to
> > dynamically query the size of a job (a new hook?), instead of having the
> > size fixed at creation time, because PVR jobs embed native fence waits,
> > and the number of native fences will decrease if some of these fences
> > are signalled before ->run_job() is called, thus reducing the job size.
>
> Exactly that is a little bit questionable since it allows for the device
> to postpone jobs infinitely.
>
> It would be good if the scheduler is able to validate if it's ever able
> to run the job when it is pushed into the entity.

Yes, we do that already. We check that the immutable part of the job
(everything that's not a native fence wait) fits in the ringbuf.