Re: [PATCH 1/4] drm/v3d: Delay the scheduler timeout if we're still making progress.

From: Eric Anholt
Date: Thu Jul 05 2018 - 12:59:17 EST


Lucas Stach <l.stach@xxxxxxxxxxxxxx> writes:

> Am Dienstag, den 03.07.2018, 10:05 -0700 schrieb Eric Anholt:
>> GTF-GLES2.gtf.GL.acos.acos_float_vert_xvary submits jobs that take 4
>> seconds at maximum resolution, but we still want to reset quickly if a
>> job is really hung.ÂÂSample the CL's current address and the return
>> address (since we call into tile lists repeatedly) and if either has
>> changed then assume we've made progress.
>
> So this means you are doubling your timeout? AFAICS for the first time
> you hit the timeout handler the cached ctca and ctra values will
> probably always differ from the current values. Maybe this warrants a
> mention in the commit message, as it's changing the behavior of the
> scheduler timeout.

I supposes that doubles the minimum timeout, but I don't think there's
any principled choice behind that value.

> Also how easy is it for userspace to construct such an infinite loop in
> the CL? Thinking about a rogue client DoSing the GPU while exploiting
> this check in the timeout handler to stay under the radar...

You'd need to have a big enough CL that you don't sample the same
location twice in a row, but otherwise it's trivial and equivalent to a
V3D33 igt case I wrote. I don't think we as the kernel particularly
cares to protect from that case, though -- it's mainly "does a broken
WebGL shader take down your desktop?" which we will still be protecting
from. If you wanted to protect from a general userspace attacker, you
could have a maximum 1 minute timeout or something, but I'm not sure
your life is actually much better when you let an arbitrary number of
clients submit many jobs to round-robin through each of which has a long
timeout like that.

Attachment: signature.asc
Description: PGP signature