Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

From: Boris Brezillon
Date: Tue Oct 03 2023 - 06:06:05 EST


Hello Thomas,

On Tue, 3 Oct 2023 10:36:10 +0200
Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> wrote:

> > +/**
> > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already iterated items
> > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > + *
> > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > + * iterator releases the lock immediately after picking the first element from
> > + * the list, so list insertion deletion can happen concurrently.
> > + *
> > + * Elements popped from the original list are kept in a local list, so removal
> > + * and is_empty checks can still happen while we're iterating the list.
> > + */
> > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \
> > + ({ \
> > + struct drm_gpuvm_bo *__vm_bo = NULL; \
> > + \
> > + drm_gpuvm_bo_put(__prev_vm_bo); \
> > + \
> > + spin_lock(&(__gpuvm)->__list_name.lock); \
>
> Here we unconditionally take the spinlocks while iterating, and the main
> point of DRM_GPUVM_RESV_PROTECTED was really to avoid that?
>
>
> > + if (!(__gpuvm)->__list_name.local_list) \
> > + (__gpuvm)->__list_name.local_list = __local_list; \
> > + else \
> > + WARN_ON((__gpuvm)->__list_name.local_list != __local_list); \
> > + \
> > + while (!list_empty(&(__gpuvm)->__list_name.list)) { \
> > + __vm_bo = list_first_entry(&(__gpuvm)->__list_name.list, \
> > + struct drm_gpuvm_bo, \
> > + list.entry.__list_name); \
> > + if (kref_get_unless_zero(&__vm_bo->kref)) {
> And unnecessarily grab a reference in the RESV_PROTECTED case.
> > \
> > + list_move_tail(&(__vm_bo)->list.entry.__list_name, \
> > + __local_list); \
> > + break; \
> > + } else { \
> > + list_del_init(&(__vm_bo)->list.entry.__list_name); \
> > + __vm_bo = NULL; \
> > + } \
> > + } \
> > + spin_unlock(&(__gpuvm)->__list_name.lock); \
> > + \
> > + __vm_bo; \
> > + })
>
> IMHO this lockless list iteration looks very complex and should be
> pretty difficult to maintain while moving forward, also since it pulls
> the gpuvm_bos off the list, list iteration needs to be protected by an
> outer lock anyway.

As being partly responsible for this convoluted list iterator, I must
say I agree with you. There's so many ways this can go wrong if the
user doesn't call it the right way, or doesn't protect concurrent list
iterations with a separate lock (luckily, this is a private iterator). I
mean, it works, so there's certainly a way to get it right, but gosh,
this is so far from the simple API I had hoped for.

> Also from what I understand from Boris, the extobj
> list would typically not need the fine-grained locking; only the evict
> list?

Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when
the GEM and VM resvs are held, and I'm deferring the drm_gpuvm_bo_put()
call to a work that's not in the dma-signalling path. This being said,
I'm still not comfortable with the

gem = drm_gem_object_get(vm_bo->gem);
dma_resv_lock(gem->resv);
drm_gpuvm_bo_put(vm_bo);
dma_resv_unlock(gem->resv);
drm_gem_object_put(gem);

dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM
owner, not to mention that drm_gpuva_unlink() calls drm_gpuvm_bo_put()
after making sure the GEM gpuvm_list lock is held, but this lock might
differ from the resv lock (custom locking so we can call
gpuvm_unlink() in the dma-signalling path). So we now have paths where
drm_gpuvm_bo_put() are called with the resv lock held, and others where
they are not, and that only works because we're relying on the the fact
those drm_gpuvm_bo_put() calls won't make the refcount drop to zero,
because the deferred vm_bo_put() work still owns a vm_bo ref.

All these tiny details add to the overall complexity of this common
layer, and to me, that's not any better than the
get_next_vm_bo_from_list() complexity you were complaining about (might
be even worth, because this sort of things leak to users).

Having an internal lock partly solves that, in that the locking of the
extobj list is now entirely orthogonal to the GEM that's being removed
from this list, and we can lock/unlock internally without forcing the
caller to take weird actions to make sure things don't explode. Don't
get me wrong, I get that this locking overhead is not acceptable for
Xe, but I feel like we're turning drm_gpuvm into a white elephant that
only few people will get right.

This is just my personal view on this, and I certainly don't want to
block or delay the merging of this patchset, but I thought I'd share my
concerns. As someone who's been following the evolution of this
drm_gpuva/vm series for weeks, and who's still sometimes getting lost,
I can't imagine how new drm_gpuvm users would feel...

> Also it seems that if we are to maintain two modes here, for
> reasonably clean code we'd need two separate instances of
> get_next_bo_from_list().
>
> For the !RESV_PROTECTED case, perhaps one would want to consider the
> solution used currently in xe, where the VM maintains two evict lists.
> One protected by a spinlock and one protected by the VM resv. When the
> VM resv is locked to begin list traversal, the spinlock is locked *once*
> and the spinlock-protected list is looped over and copied into the resv
> protected one. For traversal, the resv protected one is used.

Oh, so you do have the same sort of trick where you move the entire
list to another list, such that you can let other paths update the list
while you're iterating your own snapshot. That's interesting...

Regards,

Boris