[PATCH 1/3] drm/i915: Fix timeout handling when retiring requests

From: Janusz Krzysztofik
Date: Wed Nov 09 2022 - 14:10:35 EST


I believe that intel_gt_retire_requests_timeout() should return either
-ETIME if all time designated by timeout argument has been consumed while
waiting for fences being signaled, or remaining time if there are requests
still not retired, or 0 otherwise. In the latter case, remaining time
should be passed back via remaining_timeout argument.

Remaining time is updated with return value of each consecutive call to
dma_fence_wait_timeout(). If an error code is returned instead of
remaining time, a few potentially unexpected side effects occur:
- we no longer wait for consecutive timelines' last request fences being
signaled before we try to retire requests from those timelines -- while
expected in case of -ETIME, that's probably not intended in case of
other errors that dma_fence_wait_timeout() can return,
- the error code (a negative value) is passed back as remaining time and
if no more requests happen to be left pending despite the error, a user
may pass that value forward as a remaining timeout -- that can
potentially trigger a WARN or BUG,
- potentially unexpected error code is returned to user when a
non-critical error that probably shouldn't stop the user from retrying
occurs while active requests are still pending.
Moreover, should dma_fence_wait_timeout() ever return 0 (which should mean
timeout expiration) while we are processing requests and there are still
pending requests when we are about to return, that 0 value is returned to
user like if all requests were successfully retired.

Ignore error codes from dma_fence_wait_timeout() other than -ETIME and
don't overwrite remaining time with those error codes. Also, convert 0
value returned by dma_fence_wait_timeout() to -ETIME.

Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # v5.5+
---
drivers/gpu/drm/i915/gt/intel_gt_requests.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..6c3b8ac3055c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -156,11 +156,22 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,

fence = i915_active_fence_get(&tl->last_request);
if (fence) {
+ signed long time_left;
+
mutex_unlock(&tl->mutex);

- timeout = dma_fence_wait_timeout(fence,
- true,
- timeout);
+ time_left = dma_fence_wait_timeout(fence,
+ true,
+ timeout);
+ /*
+ * 0 or -ETIME: timeout expired
+ * other errors: ignore, assume no time consumed
+ */
+ if (time_left == -ETIME || time_left == 0)
+ timeout = -ETIME;
+ else if (time_left > 0)
+ timeout = time_left;
+
dma_fence_put(fence);

/* Retirement is best effort */
--
2.25.1