Re: [PATCH] drm/atmel-hlcdc: Release CRTC commit when destroying plane state

From: Pierre-Louis Dourneau
Date: Tue Mar 12 2024 - 04:47:55 EST


Hey Ludovic,

On 3/8/24, Ludovic.Desroches@xxxxxxxxxxxxx <Ludovic.Desroches@xxxxxxxxxxxxx> wrote:
> On 3/6/24 20:49, Pierre-Louis Dourneau wrote:
> >
> > From: Arnaud Lahache <a.lahache@xxxxxxxxxx>
> >
> > Fixes a memory leak occurring on each modeset update.
> >
> > Running a program such as libdrm's modetest[0] with this driver exhausts
> > all available memory after a few minutes. Enabling kmemleak yields a series
> > of such leak reports:
> >
> > unreferenced object 0xc21acf40 (size 64):
> > comm "modetest", pid 210, jiffies 4294942460 (age 331.240s)
> > hex dump (first 32 bytes):
> > 00 a0 a1 c1 01 00 00 00 ff ff ff ff 4c cf 1a c2 ............L...
> > 4c cf 1a c2 ff ff ff ff 58 cf 1a c2 58 cf 1a c2 L.......X...X...
> > backtrace:
> > [<d68b3e09>] kmalloc_trace+0x18/0x24
> > [<f858a020>] drm_atomic_helper_setup_commit+0x1e0/0x7e0
> > [<26e8ab04>] drm_atomic_helper_commit+0x40/0x160
> > [<49708b0c>] drm_atomic_commit+0xa8/0xf0
> > [<e58c2942>] drm_mode_obj_set_property_ioctl+0x154/0x3d8
> > [<5e97e57d>] drm_ioctl+0x200/0x3c4
> > [<ed514ba1>] sys_ioctl+0x240/0xb48
> > [<26aab344>] ret_fast_syscall+0x0/0x44
> >
> > drm_atomic_helper_setup_commit() acquires a reference to a drm_crtc_commit
> > for each CRTC and associated connectors and planes involved in a modeset.
> > 64-byte leaks map well to the size of a drm_crtc_commit on 32-bit
> > architectures, and the second 4-byte chunk in the hex dump above awfully
> > looks like a reference count.
> >
> > We tracked this missing reference decrement down to the driver's
> > atmel_hlcdc_plane_atomic_destroy_state() callback. Its CRTC counterpart,
> > atmel_hlcdc_crtc_destroy_state(), calls into the drm_atomic helpers and
> > properly releases its references to the commit. Planes didn't. Using the
> > default helper for that purpose, __drm_atomic_helper_plane_destroy_state(),
> > fixes the leak and avoids reimplementing the same logic in the driver.
> >
> > [0]: https://gitlab.freedesktop.org/mesa/drm/-/tree/main/tests/modetest
> > Invoke with `modetest -M atmel-hlcdc -s 32:#0 -v`, assuming 32 is the
> > ID of a connector.
> >
> > Signed-off-by: Arnaud Lahache <a.lahache@xxxxxxxxxx>
> > Co-developed-by: Pierre-Louis Dourneau <pl.dourneau@xxxxxxxxxx>
> > Signed-off-by: Pierre-Louis Dourneau <pl.dourneau@xxxxxxxxxx>
> > Co-developed-by: Benoît Alcaina <b.alcaina@xxxxxxxxxx>
> > Signed-off-by: Benoît Alcaina <b.alcaina@xxxxxxxxxx>
> > ---
> > As far as our testing goes, we've been running 6 of our production units
> > with this patch for more than 2 weeks as per the date this patch is sent
> > out. We can report stable memory usage.
>
> Hello Arnaud,
>
> This patch fixes the memory leak but introduces a crash on my side when
> exiting a graphics app using the Microchip graphics library.
>
>
> ------------[ cut here ]------------
>
>
>
> WARNING: CPU: 0 PID: 201 at lib/refcount.c:28
> refcount_warn_saturate+0x58/0x130
>
>
>
> refcount_t: underflow; use-after-free.
>
>
>
> Modules linked in:
>
>
>
> CPU: 0 PID: 201 Comm: basic Not tainted 6.1.55-linux4microchip-2023.10+
> #65
>
>
> Hardware name: Microchip SAM9X60
>
>
>
> unwind_backtrace from show_stack+0x10/0x18
>
>
>
> show_stack from dump_stack_lvl+0x28/0x34
>
>
>
> dump_stack_lvl from __warn+0x70/0xb8
>
>
>
> __warn from warn_slowpath_fmt+0x78/0xac
>
>
>
> warn_slowpath_fmt from refcount_warn_saturate+0x58/0x130
>
>
>
> refcount_warn_saturate from kref_put+0x54/0x5c
>
>
>
> kref_put from drm_fb_release+0x100/0x11c
>
>
>
> drm_fb_release from drm_file_free+0xcc/0x1bc
>
>
>
> drm_file_free from drm_release+0x44/0x94
>
>
>
> drm_release from __fput+0xf0/0x20c
>
>
>
> __fput from task_work_run+0x8c/0xb0
>
>
>
> task_work_run from do_exit+0x310/0x760
>
>
>
> do_exit from sys_exit_group+0x0/0x14
>
>
>
> sys_exit_group from get_signal+0x524/0x64c
>
>
>
> get_signal from do_work_pending+0xf4/0x398
>
>
>
> do_work_pending from slow_work_pending+0xc/0x24
>
>
>
> Exception stack(0xc9991fb0 to 0xc9991ff8)
>
>
>
> 1fa0: 0054dd48 00000000 005331f8
> 00000001
>
>
> 1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000
> 0051b424
>
>
> 1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff
>
>
>
> ---[ end trace 0000000000000000 ]---
>
>
>
> 8<--- cut here ---
>
>
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000004
>
>
> [00000004] *pgd=00000000
>
>
>
> Internal error: Oops: 805 [#1] ARM
>
>
>
> Modules linked in:
>
>
>
> CPU: 0 PID: 201 Comm: basic Tainted: G W
> 6.1.55-linux4microchip-2023.10+ #65
>
>
> Hardware name: Microchip SAM9X60
>
>
>
> PC is at drm_fb_release+0xc0/0x11c
>
>
>
> LR is at drm_fb_release+0x100/0x11c
>
>
>
> pc : [<c0329b10>] lr : [<c0329b50>] psr: 80000013
>
>
>
> sp : c9991e48 ip : 00000000 fp : 0051b424
>
>
>
> r10: c174a5f0 r9 : 00000000 r8 : c2188074
>
>
>
> r7 : 60000013 r6 : c9991e5c r5 : ffffff8c r4 : c2188048
>
>
>
> r3 : c1590994 r2 : c2188048 r1 : 00000000 r0 : c1590920
>
>
>
> Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
>
>
>
> Control: 0005317f Table: 2384c000 DAC: 00000051
>
>
>
> Register r0 information: slab kmalloc-192 start c1590920 pointer offset
> 0 size 192
>
>
> Register r1 information: NULL pointer
>
>
>
> Register r2 information: slab kmalloc-192 start c2188000 pointer offset
> 72 size 192
>
>
> Register r3 information: slab kmalloc-192 start c1590920 pointer offset
> 116 size 192
>
>
> Register r4 information: slab kmalloc-192 start c2188000 pointer offset
> 72 size 192
>
>
> Register r5 information: non-paged memory
>
>
>
> Register r6 information: 2-page vmalloc region starting at 0xc9990000
> allocated at kernel_clone+0xb4/0x204
>
>
> Register r7 information: non-paged memory
>
>
>
> Register r8 information: slab kmalloc-192 start c2188000 pointer offset
> 116 size 192
>
>
> Register r9 information: NULL pointer
>
>
>
> Register r10 information: slab task_struct start c174a140 pointer offset
> 1200
>
>
> Register r11 information: non-paged memory
>
>
>
> Register r12 information: NULL pointer
>
>
>
> Process basic (pid: 201, stack limit = 0x41541c7b)
>
>
>
> Stack: (0xc9991e48 to 0xc9992000)
> 1e40: 00000001 00000000 00000000 00000000 00000000
> c9991e5c
>
>
> 1e60: c9991e5c 5c44a9dd c2188000 c1621800 c2188060 c03146e4 000000c9
> 0000e200
>
>
> 1e80: 00000001 00000000 00000000 c1621800 c38f2cc0 c0fb63e0 c0c1ce70
> c0f99560
>
>
> 1ea0: 00000000 c0314a5c c38f2cc0 c10ee6e8 000a201f c00c66a0 c38f2cc0
> 00000001
>
>
> 1ec0: c09f1a4c c00c67fc c38f2e00 c174a140 c0a20b1c c2232a50 c9991ef4
> c002f000
>
>
> 1ee0: c174a140 c2232a20 c174a140 c001a094 00000002 00000008 c383c880
> 5c44a9dd
>
>
> 1f00: 00000002 c383c880 0051b424 c001a6a8 00000009 c0024af0 beb53c50
> 00000000
>
>
> 1f20: 00000000 5c44a9dd 00000000 00000000 c9991fb0 c174a140 00000000
> 00000000
>
>
> 1f40: 00000000 00000000 0051b424 c000c244 00000000 00000000 00000000
> 00000000
>
>
> 1f60: 00000000 00000000 00000009 00000000 00000000 00000000 00000000
> 00000000
>
>
> 1f80: 00000000 00000000 00000000 5c44a9dd b68357dc 20000010 ffffffff
> 0005317f
>
>
> 1fa0: 00000000 c174a140 43ab8000 c00084dc 0054dd48 00000000 005331f8
> 00000001
>
>
> 1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000
> 0051b424
>
>
> 1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff 00000000
> 00000000
>
>
> drm_fb_release from drm_file_free+0xcc/0x1bc
>
>
>
> drm_file_free from drm_release+0x44/0x94
>
>
>
> drm_release from __fput+0xf0/0x20c
>
>
>
> __fput from task_work_run+0x8c/0xb0
>
>
>
> task_work_run from do_exit+0x310/0x760
>
>
>
> do_exit from sys_exit_group+0x0/0x14
>
>
>
> sys_exit_group from get_signal+0x524/0x64c
>
>
>
> get_signal from do_work_pending+0xf4/0x398
>
>
>
> do_work_pending from slow_work_pending+0xc/0x24
>
>
>
> Exception stack(0xc9991fb0 to 0xc9991ff8)
>
>
>
> 1fa0: 0054dd48 00000000 005331f8
> 00000001
>
>
> 1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000
> 0051b424
>
>
> 1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff
>
>
>
> Code: e590c018 e5902078 e5901074 e35c0001 (e5812004)
>
>
>
> ---[ end trace 00000000c0a20288 ]---
>
>
>
> Fixing recursive fault but reboot is needed!

We've tried to reproduce your crash with 6.1.22-linux4microchip-2023.04,
to no avail. We'll try to upgrade to 6.1.55-linux4microchip-2023.10 (your
version) and test again.

Is there a particular EGT program you recommend to trigger the crash? Or
any should do? We've limited ourselves to a few of the samples provided
with the library, namely egt_basic, egt_water, egt_charts, and egt_i18n.

> The memory leak had been introduced with this commit:
>
>
> commit eec44d44a3d2d00c8f663f13555d3dd280b1ea3f
> Author: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Date: Thu Jan 21 16:29:54 2021 +0100
>
>
>
> drm/atmel: Use drm_atomic_helper_commit
>
> One of these drivers that predates the nonblocking support in helpers,
> and hand-rolled its own thing. Entirely not anything specific here, we
> can just delete it all and replace it with the helper version.
>
> Could also perhaps use the drm_mode_config_helper_suspend/resume
> stuff, for another few lines deleted. But I'm not looking at that
> stuff, I'm just going through all the atomic commit functions and make
> sure they have properly annotated dma-fence critical sections
> everywhere.
>
>
> I think this move to the drm atomic helper should have gone with an
> update on the release side as well. There is probably something wrong
> with the atomic_destroy_state callbacks provided by the atmel-hlcdc driver.
>
> Regards,
> Ludovic
>
> >
> > Admittedly, our usage of the DRM uAPI is rather simple: create 2 dumb
> > buffers, do an initial MODE_SETCRTC, and then MODE_PAGE_FLIP between the
> > two dumb buffers at the rate of once per second on average. We haven't
> > evaluated more complex workloads.
> >
> > drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
> > index daa508504f47..390c4fc62af7 100644
> > --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
> > +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
> > @@ -934,8 +934,7 @@ static void atmel_hlcdc_plane_atomic_destroy_state(struct drm_plane *p,
> > state->dscrs[i]->self);
> > }
> >
> > - if (s->fb)
> > - drm_framebuffer_put(s->fb);
> > + __drm_atomic_helper_plane_destroy_state(s);
> >
> > kfree(state);
> > }
> >
> > base-commit: 9dfc46c87cdc8f5a42a71de247a744a6b8188980
> > --
> > 2.34.1
> >

Pierre-Louis