Re: [PATCH 1/4] drm/v3d: Delay the scheduler timeout if we're still making progress.

From: Lucas Stach
Date: Fri Jul 06 2018 - 06:06:19 EST


Am Dienstag, den 03.07.2018, 10:05 -0700 schrieb Eric Anholt:
> GTF-GLES2.gtf.GL.acos.acos_float_vert_xvary submits jobs that take 4
> seconds at maximum resolution, but we still want to reset quickly if a
> job is really hung.ÂÂSample the CL's current address and the return
> address (since we call into tile lists repeatedly) and if either has
> changed then assume we've made progress.
>
> > Signed-off-by: Eric Anholt <eric@xxxxxxxxxx>
> Cc: Lucas Stach <l.stach@xxxxxxxxxxxxxx>

Reviewed-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx>

> ---
> Âdrivers/gpu/drm/v3d/v3d_drv.hÂÂÂ|ÂÂ2 ++
> Âdrivers/gpu/drm/v3d/v3d_regs.hÂÂ|ÂÂ1 +
> Âdrivers/gpu/drm/v3d/v3d_sched.c | 18 ++++++++++++++++++
> Â3 files changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index f546e0ab9562..a5d96d823416 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -189,6 +189,8 @@ struct v3d_job {
> Â
> > Â /* GPU virtual addresses of the start/end of the CL job. */
> > Â u32 start, end;
> +
> > + u32 timedout_ctca, timedout_ctra;
> Â};
> Â
> Âstruct v3d_exec_info {
> diff --git a/drivers/gpu/drm/v3d/v3d_regs.h b/drivers/gpu/drm/v3d/v3d_regs.h
> index fc13282dfc2f..854046565989 100644
> --- a/drivers/gpu/drm/v3d/v3d_regs.h
> +++ b/drivers/gpu/drm/v3d/v3d_regs.h
> @@ -222,6 +222,7 @@
> Â#define V3D_CLE_CTNCA(n) (V3D_CLE_CT0CA + 4 * n)
> Â#define V3D_CLE_CT0RAÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0x00118
> Â#define V3D_CLE_CT1RAÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0x0011c
> +#define V3D_CLE_CTNRA(n) (V3D_CLE_CT0RA + 4 * n)
> Â#define V3D_CLE_CT0LCÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0x00120
> Â#define V3D_CLE_CT1LCÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0x00124
> Â#define V3D_CLE_CT0PCÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ0x00128
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 808bc901f567..00667c733dca 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -153,7 +153,25 @@ v3d_job_timedout(struct drm_sched_job *sched_job)
> > Â struct v3d_job *job = to_v3d_job(sched_job);
> > Â struct v3d_exec_info *exec = job->exec;
> > Â struct v3d_dev *v3d = exec->v3d;
> > + enum v3d_queue job_q = job == &exec->bin ? V3D_BIN : V3D_RENDER;
> > Â enum v3d_queue q;
> > + u32 ctca = V3D_CORE_READ(0, V3D_CLE_CTNCA(job_q));
> > + u32 ctra = V3D_CORE_READ(0, V3D_CLE_CTNRA(job_q));
> +
> > + /* If the current address or return address have changed, then
> > + Â* the GPU has probably made progress and we should delay the
> > + Â* reset.ÂÂThis could fail if the GPU got in an infinite loop
> > + Â* in the CL, but that is pretty unlikely outside of an i-g-t
> > + Â* testcase.
> > + Â*/
> > + if (job->timedout_ctca != ctca || job->timedout_ctra != ctra) {
> > + job->timedout_ctca = ctca;
> > + job->timedout_ctra = ctra;
> +
> > + schedule_delayed_work(&job->base.work_tdr,
> > + ÂÂÂÂÂÂjob->base.sched->timeout);
> > + return;
> > + }
> Â
> > Â mutex_lock(&v3d->reset_lock);
> Â