Re: [PATCH v1 1/6] drm/lima: fix devfreq refcount imbalance for job timeouts

From: Qiang Yu
Date: Thu Jan 18 2024 - 20:51:04 EST


On Thu, Jan 18, 2024 at 7:14 PM Erico Nunes <nunes.erico@xxxxxxxxx> wrote:
>
> On Thu, Jan 18, 2024 at 2:36 AM Qiang Yu <yuq825@xxxxxxxxx> wrote:
> >
> > So this is caused by same job trigger both done and timeout handling?
> > I think a better way to solve this is to make sure only one handler
> > (done or timeout) process the job instead of just making lima_pm_idle()
> > unique.
>
> It's not very clear to me how to best ensure that, with the drm_sched
> software timeout and the irq happening potentially at the same time.
This could be done by stopping scheduler run more job and disable
GP/PP interrupt. Then after sync irq, there should be no more new
irq gets in when we handling timeout.

> I think patch 4 in this series describes and covers the most common
> case that this would be hit. So maybe now this patch could be dropped
> in favour of just that one.
Yes.

> But since this was a bit hard to reproduce and I'm not sure the issue
> is entirely covered by that, I just decided to keep this small change
> as it prevented all the stack trace reproducers I was able to come up
> with.
>
> Erico