Re: [Nouveau] [PATCH 5.10 32/77] drm/ttm: fix memleak in ttm_transfered_destroy

From: Christian König
Date: Thu Nov 04 2021 - 03:39:32 EST


Am 03.11.21 um 22:25 schrieb Karol Herbst:
On Wed, Nov 3, 2021 at 9:47 PM Sven Joachim <svenjoac@xxxxxx> wrote:
On 2021-11-03 21:32 +0100, Karol Herbst wrote:

On Wed, Nov 3, 2021 at 9:29 PM Karol Herbst <kherbst@xxxxxxxxxx> wrote:
On Wed, Nov 3, 2021 at 8:52 PM Sven Joachim <svenjoac@xxxxxx> wrote:
On 2021-11-01 10:17 +0100, Greg Kroah-Hartman wrote:

From: Christian König <christian.koenig@xxxxxxx>

commit 0db55f9a1bafbe3dac750ea669de9134922389b5 upstream.

We need to cleanup the fences for ghost objects as well.

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
Reported-by: Erhard F. <erhard_f@xxxxxxxxxxx>
Tested-by: Erhard F. <erhard_f@xxxxxxxxxxx>
Reviewed-by: Huang Rui <ray.huang@xxxxxxx>
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214029&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806624439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UIo0hw0OHeLlGL%2Bcj%2Fjt%2FgTwniaJoNmhgDHSFvymhCc%3D&amp;reserved=0
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214447&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TIAUb6AdYm2Bo0%2BvFZUFPS8yu55orjnfxMLCmUgC%2FDk%3D&amp;reserved=0
CC: <stable@xxxxxxxxxxxxxxx>
Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2Fmsgid%2F20211020173211.2247-1-christian.koenig%40amd.com&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c9i7AR44MVUyZuXHZkLOCBx2%2BZeetq8alGtbz0Wgqzk%3D&amp;reserved=0
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -322,6 +322,7 @@ static void ttm_transfered_destroy(struc
struct ttm_transfer_obj *fbo;

fbo = container_of(bo, struct ttm_transfer_obj, base);
+ dma_resv_fini(&fbo->base.base._resv);
ttm_bo_put(fbo->bo);
kfree(fbo);
}
Alas, this innocuous looking commit causes one of my systems to lock up
as soon as run startx. This happens with the nouveau driver, two other
systems with radeon and intel graphics are not affected. Also I only
noticed it in 5.10.77. Kernels 5.15 and 5.14.16 are not affected, and I
do not use 5.4 anymore.

I am not familiar with nouveau's ttm management and what has changed
there between 5.10 and 5.14, but maybe one of their developers can shed
a light on this.

Cheers,
Sven

could be related to 265ec0dd1a0d18f4114f62c0d4a794bb4e729bc1
maybe not.. but I did remember there being a few tmm related patches
which only hurt nouveau :/ I guess one could do a git bisect to
figure out what change "fixes" it.
Maybe, but since the memory leaks reported by Erhard only started to
show up in 5.14 (if I read the bugzilla reports correctly), perhaps the
patch should simply be reverted on earlier kernels?

Yeah, I think this is probably the right approach.

I agree. The problem is this memory leak could potentially happen with 5.10 as wel, just much much much less likely.

But my guess is that 5.10 is so buggy that when the leak does NOT happen we double free and obviously causing a crash.

So for the sake of stability please don't apply this patch to 5.10. I'm going to comment on the original bug report as well.

Thanks,
Christian.


On which GPU do you see this problem?
On an old GeForce 8500 GT, the whole PC is rather ancient.

Cheers,
Sven