dealock in drm_fb_helper_damage_work

From: Dmitry Vyukov
Date: Sun Nov 13 2022 - 15:43:07 EST


Hi,

I am getting the following deadlock on reservation_ww_class_mutex
while trying to boot next-20221111 kernel:

============================================
WARNING: possible recursive locking detected
6.1.0-rc4-next-20221111 #193 Not tainted
--------------------------------------------
kworker/4:1/81 is trying to acquire lock:
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
dma_resv_lock_interruptible include/linux/dma-resv.h:372 [inline]
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_bo_reserve include/drm/ttm/ttm_bo_driver.h:121 [inline]
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
drm_gem_vram_vmap+0xa4/0x590 drivers/gpu/drm/drm_gem_vram_helper.c:436

but task is already holding lock:
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
dma_resv_lock include/linux/dma-resv.h:345 [inline]
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
drm_gem_vmap_unlocked+0x3f/0xa0 drivers/gpu/drm/drm_gem.c:1195

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(reservation_ww_class_mutex);
lock(reservation_ww_class_mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

4 locks held by kworker/4:1/81:
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
atomic_long_set include/linux/atomic/atomic-instrumented.h:1280
[inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
set_work_data kernel/workqueue.c:636 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x8e4/0x1720 kernel/workqueue.c:2260
#1: ffffc9000694fda0
((work_completion)(&helper->damage_work)){+.+.}-{0:0}, at:
process_one_work+0x918/0x1720 kernel/workqueue.c:2264
#2: ffff88812ebe8278 (&helper->lock){+.+.}-{3:3}, at:
drm_fbdev_damage_blit drivers/gpu/drm/drm_fbdev_generic.c:312 [inline]
#2: ffff88812ebe8278 (&helper->lock){+.+.}-{3:3}, at:
drm_fbdev_fb_dirty+0x30e/0xcd0 drivers/gpu/drm/drm_fbdev_generic.c:342
#3: ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
dma_resv_lock include/linux/dma-resv.h:345 [inline]
#3: ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
drm_gem_vmap_unlocked+0x3f/0xa0 drivers/gpu/drm/drm_gem.c:1195

stack backtrace:
CPU: 4 PID: 81 Comm: kworker/4:1 Not tainted 6.1.0-rc4-next-20221111 #193
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: events drm_fb_helper_damage_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2990 [inline]
check_deadlock kernel/locking/lockdep.c:3033 [inline]
validate_chain kernel/locking/lockdep.c:3818 [inline]
__lock_acquire.cold+0x119/0x3b9 kernel/locking/lockdep.c:5055
lock_acquire kernel/locking/lockdep.c:5668 [inline]
lock_acquire+0x1e0/0x610 kernel/locking/lockdep.c:5633
__mutex_lock_common kernel/locking/mutex.c:603 [inline]
__ww_mutex_lock.constprop.0+0x1ba/0x2ee0 kernel/locking/mutex.c:754
ww_mutex_lock_interruptible+0x37/0x140 kernel/locking/mutex.c:886
dma_resv_lock_interruptible include/linux/dma-resv.h:372 [inline]
ttm_bo_reserve include/drm/ttm/ttm_bo_driver.h:121 [inline]
drm_gem_vram_vmap+0xa4/0x590 drivers/gpu/drm/drm_gem_vram_helper.c:436
drm_gem_vmap+0xc5/0x1b0 drivers/gpu/drm/drm_gem.c:1166
drm_gem_vmap_unlocked+0x4a/0xa0 drivers/gpu/drm/drm_gem.c:1196
drm_client_buffer_vmap+0x45/0xd0 drivers/gpu/drm/drm_client.c:326
drm_fbdev_damage_blit drivers/gpu/drm/drm_fbdev_generic.c:314 [inline]
drm_fbdev_fb_dirty+0x31e/0xcd0 drivers/gpu/drm/drm_fbdev_generic.c:342
drm_fb_helper_damage_work+0x27a/0x5d0 drivers/gpu/drm/drm_fb_helper.c:388
process_one_work+0xa33/0x1720 kernel/workqueue.c:2289
worker_thread+0x67d/0x10e0 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>

The config is:
https://gist.githubusercontent.com/dvyukov/2897b21db809075a22db0370c495ed2d/raw/9b2535b2ba77bb57e4f1ba2b909ad4075b6e2c6a/gistfile1.txt

Qemu command line:
qemu-system-x86_64 -enable-kvm -machine q35,nvdimm -cpu
max,migratable=off -smp 18 \
-m 72G -hda buildroot-amd64-2021.08 -kernel arch/x86/boot/bzImage -nographic \
-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic,model=virtio-net-pci \
-append "console=ttyS0 root=/dev/sda1 earlyprintk=serial rodata=n \
oops=panic panic_on_warn=1 panic=86400 coredump_filter=0xffff"