Re: [PATCH] drm/msm: Fix hw_fence error path cleanup

From: Rob Clark
Date: Thu Jul 13 2023 - 16:28:28 EST


On Thu, Jul 13, 2023 at 1:03 PM Dmitry Baryshkov
<dmitry.baryshkov@xxxxxxxxxx> wrote:
>
> On 13/07/2023 01:25, Rob Clark wrote:
> > From: Rob Clark <robdclark@xxxxxxxxxxxx>
> >
> > In an error path where the submit is free'd without the job being run,
> > the hw_fence pointer is simply a kzalloc'd block of memory. In this
> > case we should just kfree() it, rather than trying to decrement it's
> > reference count. Fortunately we can tell that this is the case by
> > checking for a zero refcount, since if the job was run, the submit would
> > be holding a reference to the hw_fence.
> >
> > Fixes: f94e6a51e17c ("drm/msm: Pre-allocate hw_fence")
> > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx>
> > ---
> > drivers/gpu/drm/msm/msm_fence.c | 6 ++++++
> > drivers/gpu/drm/msm/msm_gem_submit.c | 14 +++++++++++++-
> > 2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > index 96599ec3eb78..1a5d4f1c8b42 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.c
> > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > @@ -191,6 +191,12 @@ msm_fence_init(struct dma_fence *fence, struct msm_fence_context *fctx)
> >
> > f->fctx = fctx;
> >
> > + /*
> > + * Until this point, the fence was just some pre-allocated memory,
> > + * no-one should have taken a reference to it yet.
> > + */
> > + WARN_ON(kref_read(&fence->refcount));
>
> It this really correct to return a refcounted object with 0 refcount
> (I'm looking at submit_create() / msm_fence_alloc() )? Maybe it would be
> better to move dma_fence_get() to msm_fence_alloc() ? But don't
> immediately see, which one should be moved.

The issue is that we can't really initialize the fence until
msm_job_run(), when it is known what order the fence would be
signaled. But we don't want to do any allocations in msm_job_run()
because that could trigger the shrinker, which could need to wait
until jobs complete to release memory, forming a deadlock.

BR,
-R

> > +
> > dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
> > fctx->context, ++fctx->last_fence);
> > }
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 3f1aa4de3b87..9d66498cdc04 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -86,7 +86,19 @@ void __msm_gem_submit_destroy(struct kref *kref)
> > }
> >
> > dma_fence_put(submit->user_fence);
> > - dma_fence_put(submit->hw_fence);
> > +
> > + /*
> > + * If the submit is freed before msm_job_run(), then hw_fence is
> > + * just some pre-allocated memory, not a reference counted fence.
> > + * Once the job runs and the hw_fence is initialized, it will
> > + * have a refcount of at least one, since the submit holds a ref
> > + * to the hw_fence.
> > + */
> > + if (kref_read(&submit->hw_fence->refcount) == 0) {
> > + kfree(submit->hw_fence);
> > + } else {
> > + dma_fence_put(submit->hw_fence);
> > + }
> >
> > put_pid(submit->pid);
> > msm_submitqueue_put(submit->queue);
>
> --
> With best wishes
> Dmitry
>