Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

From: Danilo Krummrich
Date: Mon Sep 18 2023 - 07:23:11 EST

Next message: Paul Durrant: "[PATCH v2 01/12] KVM: pfncache: add a map helper function"
Previous message: Ron Economos: "Re: [PATCH 6.5 000/285] 6.5.4-rc1 review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 9/14/23 19:15, Danilo Krummrich wrote:

On 9/14/23 19:13, Thomas Hellström wrote:

On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:

On 9/14/23 13:32, Thomas Hellström wrote:

On 9/14/23 12:57, Danilo Krummrich wrote:

On 9/13/23 14:16, Danilo Krummrich wrote:

<snip>

And validate() can remove it while still holding all dma-
resv locks,
neat!
However, what if two tasks are trying to lock the VA space
concurrently? What
do we do when the drm_gpuvm_bo's refcount drops to zero in
drm_gpuva_unlink()?
Are we guaranteed that at this point of time the
drm_gpuvm_bo is not
on the
evicted list? Because otherwise we would call
drm_gpuvm_bo_destroy()
with the
dma-resv lock held, which wouldn't be allowed, since
drm_gpuvm_bo_destroy()
might drop the last reference to the drm_gem_object and
hence we'd
potentially
free the dma-resv lock while holding it, at least if it's
an external
object.

Easiest way in this scheme is to think of the lists as being
protected
by the vm's resv lock. That means anybody calling unlink()
must also
hold the vm's resv lock. (Which is OK from an UAF point of
view, but
perhaps not from a locking inversion POW from an async list
update).

This would mean that on unlink() we'd need to hold the VM's
resv lock and the
corresponding GEM's resv lock (in case they're not the same
anyways) because the
VM's resv lock would protect the external / evicted object
lists and the GEM
objects resv lock protects the GEM's list of drm_gpuvm_bos and
the
drm_gpuvm_bo's list of drm_gpuvas.

As mentioned below the same applies for drm_gpuvm_bo_put() since
it might
destroy the vm_bo, which includes removing the vm_bo from
external / evicted
object lists and the GEMs list of vm_bos.

As mentioned, if the GEM's dma-resv is different from the VM's
dma-resv we need
to take both locks. Ultimately, this would mean we need a
drm_exec loop, because
we can't know the order in which to take these locks. Doing a
full drm_exec loop
just to put() a vm_bo doesn't sound reasonable to me.

Can we instead just have an internal mutex for locking the lists
such that we
avoid taking and dropping the spinlocks, which we use currently,
in a loop?

You'd have the same locking inversion problem with a mutex, right?
Since in the eviction path you have resv->mutex, from exec you have
resv->mutex->resv because validate would attempt to grab resv.

Both lists, evict and extobj, would need to have a separate mutex,
not a common one.
We'd also need a dedicated GEM gpuva lock. Then the only rule would
be that you can't
hold the dma-resv lock when calling put(). Which I admit is not that
nice.

With the current spinlock solution drivers wouldn't need to worry
about anything locking
related though. So maybe I come back to your proposal of having a
switch for external
locking with dma-resv locks entirely. Such that with external dma-
resv locking I skip
all the spinlocks and add lockdep checks instead.

I think that makes the most sense in terms of taking advantage of
external dma-resv locking
where possible and on the other hand having a self-contained solution
if not. This should
get all concerns out of the way, yours, Christian's and Boris'.

If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
solution, and check back after a while to see if we can remove either
option once most pitfalls are hit.

Sounds good, I'll prepare this for a V4.

I was considering getting rid of the spinlocks using srcu for both external and
evicted objects instead. This would get us rid of taking/dropping the spinlock in
every iteration step of the lists, limiting it to a single srcu_read_{lock,unlock}
call per list walk. Plus, obviously the list_add_rcu() and list_del_rcu() variants
as accessors. The accessors, would probably still need a spinlock to protect against
concurrent list_add_rcu()/list_del_rcu() calls, but I think those are not a concern.

Any concerns from your side with variant?

- Danilo

Thanks,
/Thomas

That said, xe currently indeed does the vm+bo exec dance on vma
put.

One reason why that seemingly horrible construct is good, is that
when evicting an extobj and you need to access individual vmas to
Zap page table entries or TLB flush, those VMAs are not allowed to
go away (we're not refcounting them). Holding the bo resv on gpuva
put prevents that from happening. Possibly one could use another
mutex to protect the gem->vm_bo list to achieve the same, but we'd
need to hold it on gpuva put.

/Thomas

- Danilo

For extobjs an outer lock would be enough in case of
Xe, but I
really would not
like to add even more complexity just to get the
spinlock out of
the way in case
the driver already has an outer lock protecting this
path.

I must disagree here. These spinlocks and atomic
operations are
pretty
costly and as discussed earlier this type of locking was
the reason
(at
least according to the commit message) that made
Christian drop the
XArray
use in drm_exec for the same set of objects: "The locking
overhead
is
unecessary and measurable". IMHO the spinlock is the
added
complexity and a
single wide lock following the drm locking guidelines set
out by
Daniel and
David should really be the default choice with an opt-in
for a
spinlock if
needed for async and pushing out to a wq is not an
option.

For the external object list an outer lock would work as
long as it's
not the
dma-resv lock of the corresponding GEM object, since here
we actually
need to
remove the list entry from the external object list on
drm_gpuvm_bo_destroy().
It's just a bit weird design wise that drivers would need
to take
this outer
lock on:

- drm_gpuvm_bo_extobj_add()
- drm_gpuvm_bo_destroy()        (and hence also
drm_gpuvm_bo_put())
- drm_gpuva_unlink()            (because it needs to call
drm_gpuvm_bo_put())
- drm_gpuvm_exec_lock()
- drm_gpuvm_exec_lock_array()
- drm_gpuvm_prepare_range()

Given that it seems reasonable to do all the required
locking
internally.

  From a design POW, there has been a clear direction in XE to
make
things similar to mmap() / munmap(), so this outer lock,
which in Xe is
an rwsem, is used in a similar way as the mmap_lock. It's
protecting
the page-table structures and vma rb tree, the userptr
structures and
the extobj list. Basically it's taken early in the exec
IOCTL, the
VM_BIND ioctl, the compute rebind worker and the pagefault
handler, so
all of the above are just asserting that it is taken in the
correct
mode.

But strictly with this scheme one could also use the vm's
dma_resv for
the extobj list since with drm_exec, it's locked before
traversing the
list.

The whole point of this scheme is to rely on locks that you
already are
supposed to be holding for various reasons and is simple to
comprehend.

I don't agree that we're supposed to hold the VM's resv lock
anyways for
functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
I'm fine using it
for that purpose nevertheless.

In order to at least place lockdep checks, the driver would
need to
supply the
corresponding lock's lockdep_map, because the GPUVM
otherwise doesn't
know about
the lock.

Yes, that sounds reasonable. One lockdep map per list.

I'd really like to avoid that, especially now that everything
got simpler. We
should define the actual locks to take instead.

Out of curiosity, what is the overhead of a spin_lock()
that doesn't
need to
spin?

I guess it's hard to tell exactly, but it is much lower on
modern x86
than what it used to be. Not sure about ARM, which is the
other
architecture important to us. I figure if there is little
cache-line
bouncing the main overhead comes from the implied barriers.

A pretty simple way that would not add much code would be

static void gpuvm_cond_spin_lock(const struct drm_gpuvm
*gpuvm,
spinlock_t
*lock)

{

      if (!gpuvm->resv_protected_lists)
          spin_lock(lock);

}

For such drivers, that would require anybody calling
unlink to
hold the vm's
resv, though.

In V4 I want to go back to having a dedicated lock for
the GEMs
gpuva list (or
VM_BO list to be more precise). We can't just use the
dma-resv
lock for that
with VM_BO abstractions, because on destruction of a
VM_BO we
otherwise wouldn't
be allowed to already hold the dma-resv lock. That's
the fix I
was referring to
earlier.

Yeah, I can see the need for a dedicated lock for the
GEM's gpuva
list, but
holding the vm's dma-resv lock across the unlink
shouldn't be a
problem. We
may free the object and a pointer to the vm's resv during
unlink
but we
don't free the vm's resv. It'd be a matter of ensuring
that any
calls to
unlink from *within* drm_gpuvm allows it to be held.

Drivers calling unlink() from the fence signaling path
can't use the
VM's
dma-resv lock.

Yes, that made me a bit curious because in the current
version the code
required the object's dma_resv for unlink() which can't be
grabbed
either from the fence signaling path. So are there any
drivers actually
wanting to do that? If so, they will either need to resort to
the
current spinlock solution or they will need to call unlink
from a
workqueue item.

As Boris already mentioned we have the dma-resv lock by default
or a driver
specific GEM gpuva lock as opt-in. Now, we can get rid of the
latter.

Also, what if the object is an external object? We can't
use the VM's
dma-resv
lock here.

Why? Typically (sync) unlink is only ever called from an
unbind-like
operation where it should be trivial to grab the vm's resv.
Or, for
that matter any outer lock protecting the extobj list. Rule
would be
the drm_gpuvm_bo::entry::extobj and
drm_gpuvm_bo::entry::evict would
be protected by either the vm's dma_resv (or possibly an
outer lock in
the case of the extobj list).

Outer lock wouldn't have been working for updates in the async
path, but
shouldn't be relevant anymore. We could use the VM's resv for
that.

   And we can't have the GEM objs dma-resv lock held when
calling
unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
the
refcount drops
to zero calls drm_gpuvm_bo_destroy() and
drm_gpuvm_bo_destroy() might
drop the
last reference of the GEM object.

Yes, but this is a different problem as to what exactly
protects
drm_gpuvm_bo::entry::gem. Either as you suggest an internal
per bo list
lock, or if we want to keep the bo's dma_resv we need to
ensure that
the caller of dma_resv_unlock(obj->resv) actually refcounts
its obj
pointer, and doesn't implicitly rely on the gpuvm_bo's
refcount (I know
Boris didn't like that, but requiring an explicit refcount
for a
pointer you dereference unless you're under a lock that
ensures keeping
the object alive is pretty much required?) But anyway for the
drm_gpuvm_bo::entry::gem list protection (bo resv or internal
spinlock)
I don't have a strong preference.

We can keep the GEM objects dma-resv lock, however as mentioned
above
drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
the VM's resv lock
and the GEM's resv lock in case they differ.

Next message: Paul Durrant: "[PATCH v2 01/12] KVM: pfncache: add a map helper function"
Previous message: Ron Economos: "Re: [PATCH 6.5 000/285] 6.5.4-rc1 review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]