Re: [PATCH v2] drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()

From: Steven Price
Date: Wed Apr 19 2023 - 06:05:55 EST


On 19/04/2023 10:53, Steven Price wrote:
> On 19/04/2023 10:44, Lucas Stach wrote:
>> Hi Steven,
>>
>> Am Mittwoch, dem 19.04.2023 um 10:39 +0100 schrieb Steven Price:
>>> On 18/04/2023 11:04, Danilo Krummrich wrote:
>>>> It already happend a few times that patches slipped through which
>>>> implemented access to an entity through a job that was already removed
>>>> from the entities queue. Since jobs and entities might have different
>>>> lifecycles, this can potentially cause UAF bugs.
>>>>
>>>> In order to make it obvious that a jobs entity pointer shouldn't be
>>>> accessed after drm_sched_entity_pop_job() was called successfully, set
>>>> the jobs entity pointer to NULL once the job is removed from the entity
>>>> queue.
>>>>
>>>> Moreover, debugging a potential NULL pointer dereference is way easier
>>>> than potentially corrupted memory through a UAF.
>>>>
>>>> Signed-off-by: Danilo Krummrich <dakr@xxxxxxxxxx>
>>>
>>> This triggers a splat for me (with Panfrost driver), the cause of which
>>> is the following code in drm_sched_get_cleanup_job():
>>>
>>> if (job) {
>>> job->entity->elapsed_ns += ktime_to_ns(
>>> ktime_sub(job->s_fence->finished.timestamp,
>>> job->s_fence->scheduled.timestamp));
>>> }
>>>
>>> which indeed is accessing entity after the job has been returned from
>>> drm_sched_entity_pop_job(). And obviously job->entity is a NULL pointer
>>> with this patch.
>>>
>>> I'm afraid I don't fully understand the lifecycle so I'm not sure if
>>> this is simply exposing an existing bug in drm_sched_get_cleanup_job()
>>> or if this commit needs to be reverted.
>>>
>> Not sure which tree you are on. The offending commit has been reverted
>> in 6.3-rc5.
>
> This is in drm-misc-next - I'm not sure which "offending commit" you are
> referring to. I'm referring to:
>
> 96c7c2f4d5bd ("drm/scheduler: set entity to NULL in
> drm_sched_entity_pop_job()")
>
> which was merged yesterday to drm-misc-next (and is currently the top
> commit).
>
> Is there another commit which has been reverted elsewhere which is
> conflicting?

Answering my own question, the conflicting commit is:

baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"")

But that commit isn't (yet) in drm-misc-next. Which unfortunately means
drm-misc-next is broken until a back-merge happens.

Steve