[BUG] Unloading mt7921e module cause use-after-free

From: Mikhail Gavrilov
Date: Wed Jan 10 2024 - 10:16:03 EST


Greetings,
For bug reproduction just type:
# rmmod mt7921e

Backtrace:
BUG: KASAN: use-after-free in tasklet_action_common.isra.0+0x6a4/0x7a0
Read of size 8 at addr ffff888146806748 by task ksoftirqd/5/48
CPU: 5 PID: 48 Comm: ksoftirqd/5 Tainted: G W L -------
--- 6.8.0-0.rc0.20240109git9f8413c4a66f.1.fc40.x86_64+debug #1
Hardware name: Micro-Star International Co., Ltd. MS-7D73/MPG B650I
EDGE WIFI (MS-7D73), BIOS 1.81 01/05/2024
Call Trace:
<TASK>
dump_stack_lvl+0x76/0xd0
print_report+0xcf/0x670
? tasklet_action_common.isra.0+0x6a4/0x7a0
kasan_report+0xa6/0xe0
? tasklet_action_common.isra.0+0x6a4/0x7a0
tasklet_action_common.isra.0+0x6a4/0x7a0
__do_softirq+0x215/0x8b9
? __pfx___do_softirq+0x10/0x10
? run_ksoftirqd+0x73/0x80
? __pfx_run_ksoftirqd+0x10/0x10
run_ksoftirqd+0x4b/0x80
smpboot_thread_fn+0x56d/0x900
? __kthread_parkme+0xbd/0x1f0
? __pfx_smpboot_thread_fn+0x10/0x10
kthread+0x2f2/0x3d0
? _raw_spin_unlock_irq+0x28/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
The buggy address belongs to the physical page:
page:0000000021f6fa86 refcount:0 mapcount:0 mapping:0000000000000000
index:0x1 pfn:0x146806
flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888146806600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888146806680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff888146806700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff888146806780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888146806800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Demonstration: https://youtu.be/4dSuQp0aPkQ

Probably I wouldn't have paid attention to this because in real life I
did not need to unload module mt7921e.
But after commit 9270270d62191b7549296721e8d5f3dc0df01563 I see
"use-after-free" on every system shutdown and reboot.

mikhail@secondary-ws ~/p/g/linux ((fcc51acf)|BISECTING)> git bisect good
9270270d62191b7549296721e8d5f3dc0df01563 is the first bad commit
commit 9270270d62191b7549296721e8d5f3dc0df01563
Author: Deren Wu <deren.wu@xxxxxxxxxxxx>
Date: Tue Feb 14 10:49:57 2023 +0800

wifi: mt76: mt7921: fix PCI DMA hang after reboot

mt7921 just stop some workers and clean up chip status before reboot.
In stress test, there are working activities still running at the period
of .shutdown callback and that would cause some hosts cannot recover
DMA after reboot. To avoid the floating state in reboot, we use
mt7921_pci_remove() to fully deinit all resources.

Fixes: f23a0cea8bd6 ("wifi: mt76: mt7921e: add pci .shutdown() support")
Signed-off-by: Deren Wu <deren.wu@xxxxxxxxxxxx>
Reviewed-by: AngeloGioacchino Del Regno
<angelogioacchino.delregno@xxxxxxxxxxxxx>
Signed-off-by: Felix Fietkau <nbd@xxxxxxxx>

drivers/net/wireless/mediatek/mt76/mt7921/pci.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

Most oldest kernel which I could build is 5.17 and on this kernel
use-after-free has different backtrace:
BUG: KASAN: use-after-free in mt7921_irq_handler+0xd8/0x100 [mt7921e]
Read of size 8 at addr ffff88824a7d3b78 by task rmmod/11115
CPU: 28 PID: 11115 Comm: rmmod Tainted: G W L 5.17.0 #10
Hardware name: Micro-Star International Co., Ltd. MS-7D73/MPG B650I
EDGE WIFI (MS-7D73), BIOS 1.81 01/05/2024
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xa0
print_address_description.constprop.0+0x1f/0x190
? mt7921_irq_handler+0xd8/0x100 [mt7921e]
? mt7921_irq_handler+0xd8/0x100 [mt7921e]
kasan_report.cold+0x7f/0x11b
? mt7921_irq_handler+0xd8/0x100 [mt7921e]
mt7921_irq_handler+0xd8/0x100 [mt7921e]
free_irq+0x627/0xaa0
devm_free_irq+0x94/0xd0
? devm_request_any_context_irq+0x160/0x160
? kobject_put+0x18d/0x4a0
mt7921_pci_remove+0x153/0x190 [mt7921e]
pci_device_remove+0xa2/0x1d0
__device_release_driver+0x346/0x6e0
driver_detach+0x1ef/0x2c0
bus_remove_driver+0xe7/0x2d0
? __check_object_size+0x57/0x310
pci_unregister_driver+0x26/0x250
__do_sys_delete_module+0x307/0x510
? free_module+0x6a0/0x6a0
? fpregs_assert_state_consistent+0x4b/0xb0
? rcu_read_lock_sched_held+0x10/0x70
? syscall_enter_from_user_mode+0x20/0x70
? trace_hardirqs_on+0x1c/0x130
do_syscall_64+0x5c/0x80
? trace_hardirqs_on_prepare+0x72/0x160
? do_syscall_64+0x68/0x80
? trace_hardirqs_on_prepare+0x72/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc83aad105b
Code: 73 01 c3 48 8b 0d bd 8d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66
2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 8d 8d 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc384c28c8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000560eec64a750 RCX: 00007fc83aad105b
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000560eec64a7b8
RBP: 00007ffc384c28f0 R08: 1999999999999999 R09: 0000000000000000
R10: 00007fc83ab49ac0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffc384c2b60 R14: 0000560eec64a750 R15: 0000000000000000
</TASK>
The buggy address belongs to the page:
page:00000000f94118a1 refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x24a7d3
flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc0000000 0000000000000000 ffffea000929f488 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88824a7d3a00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88824a7d3a80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88824a7d3b00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88824a7d3b80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88824a7d3c00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

All kernel logs and .config are attached to this message.
What did you think?

--
Best Regards,
Mike Gavrilov.

Attachment: dmesg-6.8.zip
Description: Zip archive

Attachment: dmesg-5.17.0.zip
Description: Zip archive

Attachment: .config.zip
Description: Zip archive

Attachment: build-error-5.16.zip
Description: Zip archive