RE: BUG: kernel NULL pointer dereference, address: 0000000000000026 after switching to 5.7 kernel

From: Russell, Kent
Date: Mon Apr 13 2020 - 07:36:56 EST


[AMD Official Use Only - Internal Distribution Only]

You can add a Reviewed-By and Tested-By for me on this patch, unless you want me to send it out instead, then you can review it.

Kent

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> Christian KÃnig
> Sent: Saturday, April 11, 2020 5:57 AM
> To: Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx>; amd-gfx list
> <amd- gfx@xxxxxxxxxxxxxxxxxxxxx>; dri-devel
> <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; Linux List Kernel Mailing
> <linux-kernel@xxxxxxxxxxxxxxx>; Grodzovsky, Andrey
> <Andrey.Grodzovsky@xxxxxxx>; Russell, Kent <Kent.Russell@xxxxxxx>
> Subject: Re: BUG: kernel NULL pointer dereference, address:
> 0000000000000026 after switching to 5.7 kernel
>
> Yeah, that is a known issue.
>
> You could try the attached patch, but please be aware that it is not
> even compile tested because of the Easter holidays here.
>
> Thanks,
> Christian.
>
> Am 10.04.20 um 21:51 schrieb Mikhail Gavrilov:
> > Hi folks.
> > After upgrade kernel to 5.7 I see every boot in kernel log following
> > error messages:
> >
> > [ 2.569513] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
> > [ 2.569538] [drm] PSP loading UVD firmware
> > [ 2.570038] BUG: kernel NULL pointer dereference, address:
> 0000000000000026
> > [ 2.570045] #PF: supervisor read access in kernel mode
> > [ 2.570050] #PF: error_code(0x0000) - not-present page
> > [ 2.570055] PGD 0 P4D 0
> > [ 2.570060] Oops: 0000 [#1] SMP NOPTI
> > [ 2.570065] CPU: 5 PID: 667 Comm: uvd_enc_1.1 Not tainted
> > 5.7.0-0.rc0.git6.1.2.fc33.x86_64 #1
> > [ 2.570072] Hardware name: System manufacturer System Product
> > Name/ROG STRIX X570-I GAMING, BIOS 1405 11/19/2019
> > [ 2.570085] RIP: 0010:__kthread_should_park+0x5/0x30
> > [ 2.570090] Code: 00 e9 fe fe ff ff e8 ca 3a 08 00 e9 49 fe ff ff
> > 48 89 df e8 dd 38 08 00 84 c0 0f 84 6a ff ff ff e9 a6 fe ff ff 0f 1f
> > 44 00 00 <f6> 47 26 20 74 12 48 8b 87 88 09 00 00 48 8b 00 48 c1 e8
> > 02
> > 83 e0
> > [ 2.570103] RSP: 0018:ffffad8141723e50 EFLAGS: 00010246
> > [ 2.570107] RAX: 7fffffffffffffff RBX: ffff8a8d1d116ed8 RCX:
> 0000000000000000
> > [ 2.570112] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> 0000000000000000
> > [ 2.570116] RBP: ffff8a8d28c11300 R08: 0000000000000000 R09:
> 0000000000000000
> > [ 2.570120] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8a8d1d152e40
> > [ 2.570125] R13: ffff8a8d1d117280 R14: ffff8a8d1d116ed8 R15:
> ffff8a8d1ca68000
> > [ 2.570131] FS: 0000000000000000(0000) GS:ffff8a8d3aa00000(0000)
> > knlGS:0000000000000000
> > [ 2.570137] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2.570142] CR2: 0000000000000026 CR3: 00000007e3dc6000 CR4:
> 00000000003406e0
> > [ 2.570147] Call Trace:
> > [ 2.570157] drm_sched_get_cleanup_job+0x42/0x130 [gpu_sched]
> > [ 2.570166] drm_sched_main+0x6f/0x530 [gpu_sched]
> > [ 2.570173] ? lockdep_hardirqs_on+0x11e/0x1b0
> > [ 2.570179] ? drm_sched_get_cleanup_job+0x130/0x130 [gpu_sched]
> > [ 2.570185] kthread+0x131/0x150
> > [ 2.570189] ? __kthread_bind_mask+0x60/0x60
> > [ 2.570196] ret_from_fork+0x27/0x50
> > [ 2.570203] Modules linked in: fjes(-) amdgpu(+) amd_iommu_v2
> > gpu_sched ttm drm_kms_helper drm crc32c_intel igb nvme nvme_core dca
> > i2c_algo_bit wmi pinctrl_amd br_netfilter bridge stp llc fuse
> > [ 2.570223] CR2: 0000000000000026
> > [ 2.570228] ---[ end trace 80c25d326e1e0d7c ]---
> > [ 2.570233] RIP: 0010:__kthread_should_park+0x5/0x30
> > [ 2.570238] Code: 00 e9 fe fe ff ff e8 ca 3a 08 00 e9 49 fe ff ff
> > 48 89 df e8 dd 38 08 00 84 c0 0f 84 6a ff ff ff e9 a6 fe ff ff 0f 1f
> > 44 00 00 <f6> 47 26 20 74 12 48 8b 87 88 09 00 00 48 8b 00 48 c1 e8
> > 02
> > 83 e0
> > [ 2.570250] RSP: 0018:ffffad8141723e50 EFLAGS: 00010246
> > [ 2.570255] RAX: 7fffffffffffffff RBX: ffff8a8d1d116ed8 RCX:
> 0000000000000000
> > [ 2.570260] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> 0000000000000000
> > [ 2.570265] RBP: ffff8a8d28c11300 R08: 0000000000000000 R09:
> 0000000000000000
> > [ 2.570271] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8a8d1d152e40
> > [ 2.570276] R13: ffff8a8d1d117280 R14: ffff8a8d1d116ed8 R15:
> ffff8a8d1ca68000
> > [ 2.570281] FS: 0000000000000000(0000) GS:ffff8a8d3aa00000(0000)
> > knlGS:0000000000000000
> > [ 2.570287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2.570292] CR2: 0000000000000026 CR3: 00000007e3dc6000 CR4:
> 00000000003406e0
> > [ 2.570299] BUG: sleeping function called from invalid context at
> > include/linux/percpu-rwsem.h:49
> > [ 2.570306] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid:
> > 667, name: uvd_enc_1.1
> > [ 2.570311] INFO: lockdep is turned off.
> > [ 2.570315] irq event stamp: 14
> > [ 2.570319] hardirqs last enabled at (13): [<ffffffffb1b8c976>]
> > _raw_spin_unlock_irqrestore+0x46/0x60
> > [ 2.570330] hardirqs last disabled at (14): [<ffffffffb1004932>]
> > trace_hardirqs_off_thunk+0x1a/0x1c
> > [ 2.570338] softirqs last enabled at (0): [<ffffffffb10e04f6>]
> > copy_process+0x706/0x1bc0
> > [ 2.570345] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [ 2.570351] CPU: 5 PID: 667 Comm: uvd_enc_1.1 Tainted: G D
> > 5.7.0-0.rc0.git6.1.2.fc33.x86_64 #1
> > [ 2.570358] Hardware name: System manufacturer System Product
> > Name/ROG STRIX X570-I GAMING, BIOS 1405 11/19/2019
> > [ 2.570365] Call Trace:
> > [ 2.570373] dump_stack+0x8b/0xc8
> > [ 2.570380] ___might_sleep.cold+0xb6/0xc6
> > [ 2.570385] exit_signals+0x1c/0x2d0
> > [ 2.570390] do_exit+0xb1/0xc30
> > [ 2.570395] ? kthread+0x131/0x150
> > [ 2.570400] rewind_stack_do_exit+0x17/0x20
> > [ 2.570559] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
> > [ 2.570572] [drm] PSP loading VCE firmware
> > [ 3.146462] [drm] reserve 0x400000 from 0x83fe800000 for PSP TMR
> >
> > $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> > /lib/debug/lib/modules/`uname -r`/vmlinux __kthread_should_park+0x5
> > __kthread_should_park+0x5/0x30:
> > to_kthread at kernel/kthread.c:75
> > (inlined by) __kthread_should_park at kernel/kthread.c:109
> >
> > I think this issue related to amdgpu driver.
> > Can anyone look into it?
> >
> > Thanks.
> >
> > Full kernel log here:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpa
> > st
> >
> ebin.com%2FRrSp6KYL&amp;data=02%7C01%7Ckent.russell%40amd.com%7C
> e652bb
> >
> d5df6544c62c4d08d7ddfea8ac%7C3dd8961fe4884e608e11a82d994e183d%7C
> 0%7C0%
> >
> 7C637221958269343622&amp;sdata=4IqY73hY%2BBOECvA7qiYAj7zIGPbwHeI
> v4xoZu
> > JZWXjg%3D&amp;reserved=0
> >
> > --
> > Best Regards,
> > Mike Gavrilov.
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > st
> > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&amp;data=02%7C01%7Cke
> >
> nt.russell%40amd.com%7Ce652bbd5df6544c62c4d08d7ddfea8ac%7C3dd8961
> fe488
> >
> 4e608e11a82d994e183d%7C0%7C0%7C637221958269343622&amp;sdata=U0
> 0ELBRmZd
> > 5%2By4PnJqe%2B6FmQDro6luZdxiy1uGIDWOs%3D&amp;reserved=0