Re: [PATCH] drm/amdgpu: Fix a potential sdma invalid access

From: Christian König
Date: Fri Apr 02 2021 - 12:25:55 EST


Hi Qu,

Am 02.04.21 um 05:18 schrieb Qu Huang:
Before dma_resv_lock(bo->base.resv, NULL) in amdgpu_bo_release_notify(),
the bo->base.resv lock may be held by ttm_mem_evict_first(),

That can't happen since when bo_release_notify is called the BO has not more references and is therefore deleted.

And we never evict a deleted BO, we just wait for it to become idle.

Regards,
Christian.

and the VRAM mem will be evicted, mem region was replaced
by Gtt mem region. amdgpu_bo_release_notify() will then
hold the bo->base.resv lock, and SDMA will get an invalid
address in amdgpu_fill_buffer(), resulting in a VMFAULT
or memory corruption.

To avoid it, we have to hold bo->base.resv lock first, and
check whether the mem.mem_type is TTM_PL_VRAM.

Signed-off-by: Qu Huang <jinsdb@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4b29b82..8018574 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1300,12 +1300,16 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
if (bo->base.resv == &bo->base._resv)
amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);

- if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
- !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
+ if (!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
return;

dma_resv_lock(bo->base.resv, NULL);

+ if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node) {
+ dma_resv_unlock(bo->base.resv);
+ return;
+ }
+
r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, &fence);
if (!WARN_ON(r)) {
amdgpu_bo_fence(abo, fence, false);
--
1.8.3.1