Re: [RFC] Revert "drm/sched: Split free_job into own work item"

From: Mario Limonciello
Date: Wed Jan 24 2024 - 11:39:56 EST


On 1/24/2024 10:26, Vlastimil Babka wrote:
On 1/23/24 03:11, Mario Limonciello wrote:
commit f7fe64ad0f22 ("drm/sched: Split free_job into own work item")
causes graphics hangs at GDM or right after logging in on a
Framework 13 AMD laptop (containing a Phoenix APU).

This reverts commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7.

Fixes: f7fe64ad0f22 ("drm/sched: Split free_job into own work item")
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
This is a regression introduced in 6.8-rc1, bisected from 6.7.
This revert done on top of 6.8-rc1 fixes the issue.

Applying this revert on 6.8-rc1 fixed my issues reported here:
https://lore.kernel.org/all/2faccc1a-7fdd-499b-aa0a-bd54f4068f3e@xxxxxxx/

Let me know if there's another fix instead of revert so I can test.


There's not another fix at the moment, but Matthew has posted a patch to allow ftrace to capture more data with the gpu_scheduler trace events to this bug report:

https://gitlab.freedesktop.org/drm/amd/-/issues/3124

I already captured from a local machine and attached to that bug report.

Thanks,
Vlastimil


I'm happy to gather any data to use to properly debug if that is
preferable to a revert.
---
drivers/gpu/drm/scheduler/sched_main.c | 133 +++++++++----------------
include/drm/gpu_scheduler.h | 4 +-
2 files changed, 48 insertions(+), 89 deletions(-)