Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

From: Mikhail Gavrilov
Date: Thu Apr 20 2023 - 17:24:37 EST


On Thu, Apr 20, 2023 at 2:59 PM Christian König
<christian.koenig@xxxxxxx> wrote:
> Could you try drm-misc-next as well?

If as I assume I cloned right repo
$ git clone -b drm-misc-next
git://anongit.freedesktop.org/drm/drm-misc linux-drm-misc-next
for my hardware last commit on this branch is turned out completely unworking.
Instead of the GDM login screen I see a black screen and hear howls of GPU fans.

In the kernel logs I see general protection fault:
general protection fault, probably for non-canonical address
0xdffffc000000002b: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000158-0x000000000000015f]
CPU: 0 PID: 749 Comm: sdma0 Tainted: G W L
6.3.0-rc4-misc-next-91c249b2b9f6a80c744387b6713adf275ffd296b+ #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 4601 02/02/2023
RIP: 0010:drm_sched_get_cleanup_job+0x41b/0x5c0 [gpu_sched]
Code: fa 48 c1 ea 03 80 3c 02 00 75 5c 49 8b 9f 80 00 00 00 48 b8 00
00 00 00 00 fc ff df 48 8d bb 58 01 00 00 48 89 fa 48 c1 ea 03 <80> 3c
02 00 75 55 48 01 ab 58 01 00 00 e9 0c fd ff ff 48 89 ef e8
RSP: 0018:ffffc9000548fdb8 EFLAGS: 00010216
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000002b RSI: 0000000000000004 RDI: 0000000000000158
RBP: 000000000000085c R08: 0000000000000000 R09: ffff888170711783
R10: ffffed102e0e22f0 R11: ffffffff8da81678 R12: ffff8881707116b0
R13: ffff888170711780 R14: ffff888266f89820 R15: ffff888266f89808
FS: 0000000000000000(0000) GS:ffff888fa2000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560cea4a8000 CR3: 0000000191602000 CR4: 0000000000350ef0
Call Trace:
<TASK>
drm_sched_main+0xc3/0x930 [gpu_sched]
? __pfx_drm_sched_main+0x10/0x10 [gpu_sched]
? __pfx_autoremove_wake_function+0x10/0x10
? __kthread_parkme+0xc1/0x1f0
? __pfx_drm_sched_main+0x10/0x10 [gpu_sched]
kthread+0x2a2/0x340
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
Modules linked in: amdgpu(+) drm_ttm_helper ttm video crct10dif_pclmul
drm_suballoc_helper crc32_pclmul iommu_v2 crc32c_intel drm_buddy
polyval_clmulni gpu_sched polyval_generic ucsi_ccg drm_display_helper
typec_ucsi nvme ghash_clmulni_intel igb typec ccp sha512_ssse3 cec
nvme_core sp5100_tco dca i2c_algo_bit nvme_common wmi ip6_tables
ip_tables fuse
---[ end trace 0000000000000000 ]---
RIP: 0010:drm_sched_get_cleanup_job+0x41b/0x5c0 [gpu_sched]
Code: fa 48 c1 ea 03 80 3c 02 00 75 5c 49 8b 9f 80 00 00 00 48 b8 00
00 00 00 00 fc ff df 48 8d bb 58 01 00 00 48 89 fa 48 c1 ea 03 <80> 3c
02 00 75 55 48 01 ab 58 01 00 00 e9 0c fd ff ff 48 89 ef e8
RSP: 0018:ffffc9000548fdb8 EFLAGS: 00010216
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000002b RSI: 0000000000000004 RDI: 0000000000000158
RBP: 000000000000085c R08: 0000000000000000 R09: ffff888170711783
R10: ffffed102e0e22f0 R11: ffffffff8da81678 R12: ffff8881707116b0
R13: ffff888170711780 R14: ffff888266f89820 R15: ffff888266f89808
FS: 0000000000000000(0000) GS:ffff888fa2000000(0000) knlGS:0000000000000000


I also attached a full system log.

--
Best Regards,
Mike Gavrilov.

Attachment: system-log.tar.xz
Description: application/xz