Re: [PATCH v6 14/22] dma-buf: Introduce new locking convention

From: Thomas Hellström (Intel)
Date: Tue Jun 28 2022 - 17:32:43 EST



On 5/30/22 15:57, Dmitry Osipenko wrote:
On 5/30/22 16:41, Christian König wrote:
Hi Dmitry,

Am 30.05.22 um 15:26 schrieb Dmitry Osipenko:
Hello Christian,

On 5/30/22 09:50, Christian König wrote:
Hi Dmitry,

First of all please separate out this patch from the rest of the series,
since this is a complex separate structural change.
I assume all the patches will go via the DRM tree in the end since the
rest of the DRM patches in this series depend on this dma-buf change.
But I see that separation may ease reviewing of the dma-buf changes, so
let's try it.
That sounds like you are underestimating a bit how much trouble this
will be.

I have tried this before and failed because catching all the locks in
the right code paths are very tricky. So expect some fallout from this
and make sure the kernel test robot and CI systems are clean.
Sure, I'll fix up all the reported things in the next iteration.

BTW, have you ever posted yours version of the patch? Will be great if
we could compare the changed code paths.
No, I never even finished creating it after realizing how much work it
would be.

This patch introduces new locking convention for dma-buf users. From
now
on all dma-buf importers are responsible for holding dma-buf
reservation
lock around operations performed over dma-bufs.

This patch implements the new dma-buf locking convention by:

    1. Making dma-buf API functions to take the reservation lock.

    2. Adding new locked variants of the dma-buf API functions for
drivers
       that need to manage imported dma-bufs under the held lock.
Instead of adding new locked variants please mark all variants which
expect to be called without a lock with an _unlocked postfix.

This should make it easier to remove those in a follow up patch set and
then fully move the locking into the importer.
Do we really want to move all the locks to the importers? Seems the
majority of drivers should be happy with the dma-buf helpers handling
the locking for them.
Yes, I clearly think so.

    3. Converting all drivers to the new locking scheme.
I have strong doubts that you got all of them. At least radeon and
nouveau should grab the reservation lock in their ->attach callbacks
somehow.
Radeon and Nouveau use gem_prime_import_sg_table() and they take resv
lock already, seems they should be okay (?)
You are looking at the wrong side. You need to fix the export code path,
not the import ones.

See for example attach on radeon works like this
drm_gem_map_attach->drm_gem_pin->radeon_gem_prime_pin->radeon_bo_reserve->ttm_bo_reserve->dma_resv_lock.
Yeah, I was looking at the both sides, but missed this one.

Also i915 will run into trouble with attach. In particular since i915 starts a full ww transaction in its attach callback to be able to lock other objects if migration is needed. I think i915 CI would catch this in a selftest.

Perhaps it's worthwile to take a step back and figure out, if the importer is required to lock, which callbacks might need a ww acquire context?

(And off-topic, Since we do a lot of fancy stuff under dma-resv locks including waiting for fences and other locks, IMO taking these locks uninterruptible should ring a warning bell)

/Thomas


Same for nouveau and probably a few other exporters as well. That will
certainly cause a deadlock if you don't fix it.

I strongly suggest to do this step by step, first attach/detach and then
the rest.
Thank you very much for the suggestions. I'll implement them in the next
version.