Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

From: Christian König
Date: Thu Aug 31 2023 - 06:27:43 EST


Am 30.08.23 um 20:53 schrieb Chia-I Wu:
On Sun, Jul 23, 2023 at 6:24 PM Sasha Levin <sashal@xxxxxxxxxx> wrote:
From: Lang Yu <Lang.Yu@xxxxxxx>

[ Upstream commit 187916e6ed9d0c3b3abc27429f7a5f8c936bd1f0 ]

When using cpu to update page tables, vm update fences are unused.
Install stub fence into these fence pointers instead of NULL
to avoid NULL dereference when calling dma_fence_wait() on them.

Suggested-by: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
We start getting this warning spew on chromeos

Yeah because the older kernels still kept track of the last VM fence in the syncobj.

This patch here should probably not have been back ported.

Why was that done anyway? The upstream commit doesn't have a CC stable and this is only a bug fix for a new feature not present on older kernels.

Regards,
Christian.


, likely from
dma_fence_is_later because the stub fence is on a different timeline:

[ 273.334767] WARNING: CPU: 1 PID: 13383 at
include/linux/dma-fence.h:478 amdgpu_sync_keep_later+0x95/0xbd
[ 273.334769] Modules linked in: snd_seq_dummy snd_seq snd_seq_device
bridge stp llc tun vhost_vsock vhost vhost_iotlb
vmw_vsock_virtio_transport_common vsock 8021q veth lzo_rle
lzo_compress zram uinput snd_acp_sof_mach snd_acp_mach snd_soc_dmic
xt_cgroup rfcomm xt_MASQUERADE cmac algif_hash algif_skcipher af_alg
btusb btrtl btintel btbcm rtw89_8852ae rtw89_pci rtw89_8852a
rtw89_core snd_sof_amd_renoir snd_sof_xtensa_dsp snd_sof_amd_acp
snd_acp_pci snd_acp_config snd_soc_acpi snd_pci_acp3x snd_sof_pci
snd_sof snd_hda_codec_hdmi snd_sof_utils snd_hda_intel mac80211
snd_intel_dspcfg snd_hda_codec cros_ec_typec snd_hwdep roles
snd_hda_core typec snd_soc_rt5682s snd_soc_rt1019 snd_soc_rl6231
ip6table_nat i2c_piix4 fuse bluetooth ecdh_generic ecc cfg80211
iio_trig_sysfs cros_ec_lid_angle cros_ec_sensors cros_ec_sensors_core
industrialio_triggered_buffer kfifo_buf industrialio cros_ec_sensorhub
r8153_ecm cdc_ether usbnet r8152 mii uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2
[ 273.334795] videobuf2_common joydev
[ 273.334799] CPU: 1 PID: 13383 Comm: chrome:cs0 Tainted: G W
5.10.192-23384-g3d3f0f0c5e4f #1
fe1e7e3b7510aa7b8e01701478119255f825a36f
[ 273.334800] Hardware name: Google Dewatt/Dewatt, BIOS
Google_Dewatt.14500.347.0 03/30/2023
[ 273.334802] RIP: 0010:amdgpu_sync_keep_later+0x95/0xbd
[ 273.334804] Code: 00 00 b8 01 00 00 00 f0 0f c1 43 38 85 c0 74 26
8d 48 01 09 c1 78 24 49 89 1e 5b 41 5e 5d c3 cc cc cc cc e8 4a 94 ac
ff eb ce <0f> 0b 49 8b 06 48 85 c0 75 af eb c2 be 02 00 00 00 48 8d 7b
38 e8
[ 273.334805] RSP: 0018:ffffb222c1817b50 EFLAGS: 00010293
[ 273.334807] RAX: ffffffff89bfc838 RBX: ffff8aa425e9ed00 RCX: 0000000000000000
[ 273.334808] RDX: ffff8aa426156a98 RSI: ffff8aa425e9ed00 RDI: ffff8aa432518918
[ 273.334810] RBP: ffffb222c1817b60 R08: ffff8aa43ca6c0a0 R09: ffff8aa33af3c9a0
[ 273.334811] R10: fffffcf8c5986600 R11: ffffffff87a00fce R12: 0000000000000098
[ 273.334812] R13: 00000000005e2a00 R14: ffff8aa432518918 R15: 0000000000000000
[ 273.334814] FS: 00007e70f8694640(0000) GS:ffff8aa4e6080000(0000)
knlGS:0000000000000000
[ 273.334816] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 273.334817] CR2: 00007e70ea049020 CR3: 0000000178e6e000 CR4: 0000000000750ee0
[ 273.334818] PKRU: 55555554
[ 273.334819] Call Trace:
[ 273.334822] ? __warn+0xa3/0x131
[ 273.334824] ? amdgpu_sync_keep_later+0x95/0xbd
[ 273.334826] ? report_bug+0x97/0xfa
[ 273.334829] ? handle_bug+0x41/0x66
[ 273.334832] ? exc_invalid_op+0x1b/0x72
[ 273.334835] ? asm_exc_invalid_op+0x12/0x20
[ 273.334837] ? native_sched_clock+0x9a/0x9a
[ 273.334840] ? amdgpu_sync_keep_later+0x95/0xbd
[ 273.334843] amdgpu_sync_vm_fence+0x23/0x39
[ 273.334846] amdgpu_cs_ioctl+0x1782/0x1e56
[ 273.334851] ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
[ 273.334854] drm_ioctl_kernel+0xdf/0x150
[ 273.334858] drm_ioctl+0x1f5/0x3d2
[ 273.334928] ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
[ 273.334932] amdgpu_drm_ioctl+0x49/0x81
[ 273.334935] __x64_sys_ioctl+0x7d/0xc8
[ 273.334937] do_syscall_64+0x42/0x54
[ 273.334939] entry_SYSCALL_64_after_hwframe+0x4a/0xaf
[ 273.334941] RIP: 0033:0x7e70ff797649
[ 273.334943] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10
c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00
00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1d 48 8b 45 c8 64 48 2b 04 25 28
00 00
[ 273.334945] RSP: 002b:00007e70f8693170 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 273.334947] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007e70ff797649
[ 273.334948] RDX: 00007e70f8693248 RSI: 00000000c0186444 RDI: 0000000000000013
[ 273.334950] RBP: 00007e70f86931c0 R08: 00007e70f8693350 R09: 00007e70f8693340
[ 273.334951] R10: 000000000000000a R11: 0000000000000246 R12: 00000000c0186444
[ 273.334952] R13: 00007e70f8693380 R14: 00007e70f8693248 R15: 0000000000000013
[ 273.334954] ---[ end trace fc066a0fcea39e8c ]---