Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

From: Erhard Furtner
Date: Tue Feb 20 2024 - 07:46:30 EST


On Tue, 20 Feb 2024 16:12:44 +0700
Bagas Sanjaya <bagasdotme@xxxxxxxxx> wrote:

> > [ 0.000000] Linux version 6.7.5-Zen3 (root@supah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024
>
> Is it vanilla kernel (i.e. no patches applied)? Can you also check current
> mainline (v6.8-rc5)?
>
> Confused...

Yes, this kernel was built from upstream git stable sources, no additional patches.

It's just that I use my own custom kernel .config that's why I attached it. But the kernel should run in qemu too.

Also the issue is reproducible on v6.8-rc5 (dmesg attached).

Additionally I tried 'modprobe -v ttm-device-test' on v6.8-rc5 with KASAN enabled instead of KFENCE, same kernel .config otherwise. With KASAN I get a different dmesg and the test completes with a failure. And I don't seem to get memory corruption afterwards:

[...]
KTAP version 1
1..1
KTAP version 1
# Subtest: ttm_device
# module: ttm_device_test
1..5
ok 1 ttm_device_init_basic
# ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
num_dev == 3 (0x3)
not ok 2 ttm_device_init_multiple
ok 3 ttm_device_fini_basic
------------[ cut here ]------------
WARNING: CPU: 5 PID: 2146 at drivers/gpu/drm/ttm/ttm_device.c:206 ttm_device_init+0x23/0x281 [ttm]
Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher amdgpu wmi_bmof amd64_edac edac_mce_amd snd_hda_codec_hdmi input_leds snd_hda_intel amdxcp snd_intel_dspcfg kvm_amd snd_hda_codec snd_hwdep snd_hda_core mfd_core snd_pcm gpu_sched snd_timer video drm_suballoc_helper snd i2c_algo_bit drm_ttm_helper gpio_amdpt soundcore ttm drm_exec button drm_display_helper rapl gpio_generic wmi drm_buddy k10temp evdev joydev lz4 lz4_compress lz4_decompress sg zram nct6775 nct6775_core hwmon_vid hwmon loop configfs hid_generic usbhid hid sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel xhci_pci libaes xhci_hcd crypto_simd ccp cryptd usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
CPU: 5 PID: 2146 Comm: kunit_try_catch Tainted: G B N 6.8.0-rc5-Zen3 #3
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
RIP: 0010:ttm_device_init+0x23/0x281 [ttm]
Code: 31 ff e9 fa e4 d5 e6 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48 83 ec 18 8b 44 24 50 48 89 14 24 89 44 24 0c 4d 85 c0 75 0c <0f> 0b bd ea ff ff ff e9 2f 02 00 00 48 89 fb 49 89 f7 49 89 ce 4d
RSP: 0018:ffffc9000611fcf8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888190184000 RCX: ffff888100651b18
RDX: ffff88817d4a6400 RSI: ffffffffc2033d40 RDI: ffff888106abc000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888106abc000 R14: 0000000000000000 R15: ffff888100651b18
FS: 0000000000000000(0000) GS:ffff8887de880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb67e03b20 CR3: 00000001608ac000 CR4: 0000000000b50ef0
Call Trace:
<TASK>
? __warn+0x113/0x14c
? ttm_device_init+0x23/0x281 [ttm]
? report_bug+0x1b3/0x229
? ttm_device_init+0x23/0x281 [ttm]
? handle_bug+0x3c/0x7c
? exc_invalid_op+0x17/0x46
? asm_exc_invalid_op+0x1a/0x20
? ttm_device_init+0x23/0x281 [ttm]
? local_clock_noinstr+0xc/0xa8
ttm_device_kunit_init+0xf1/0x10f [ttm_kunit_helpers]
ttm_device_init_no_vma_man+0x145/0x1e7 [ttm_device_test]
? ttm_device_init_pools+0x61e/0x61e [ttm_device_test]
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? timekeeping_get_ns+0x60/0xf8
? srso_alias_return_thunk+0x5/0xfbef5
? ktime_get_ts64+0x68/0x109
kunit_try_run_case+0x269/0x3cc [kunit]
? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
? srso_alias_return_thunk+0x5/0xfbef5
? do_raw_spin_unlock+0x5d/0x1b6
? srso_alias_return_thunk+0x5/0xfbef5
? kunit_try_catch_throw+0x6a/0x6a [kunit]
? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
kunit_generic_run_threadfn_adapter+0x54/0x86 [kunit]
kthread+0x25e/0x26d
? kthread_complete_and_exit+0x1f/0x1f
ret_from_fork+0x23/0x54
? kthread_complete_and_exit+0x1f/0x1f
ret_from_fork_asm+0x11/0x20
</TASK>
---[ end trace 0000000000000000 ]---
ok 4 ttm_device_init_no_vma_man
KTAP version 1
# Subtest: ttm_device_init_pools
ok 1 No DMA allocations, no DMA32 required
ok 2 DMA allocations, DMA32 required
ok 3 No DMA allocations, DMA32 required
ok 4 DMA allocations, no DMA32 required
# ttm_device_init_pools: pass:4 fail:0 skip:0 total:4
ok 5 ttm_device_init_pools
# ttm_device: pass:4 fail:1 skip:0 total:5
# Totals: pass:7 fail:1 skip:0 total:8
not ok 1 ttm_device
[...]


Regards,
Erhard

Attachment: dmesg_68-rc5_zen3
Description: Binary data