Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

From: Peter Zijlstra
Date: Thu Dec 07 2017 - 14:57:28 EST


On Thu, Dec 07, 2017 at 03:58:28PM +0100, Daniel Vetter wrote:

> [ 85.069417] gem_exec_captur/2810 is trying to acquire lock:
> [ 85.069419] ((completion)&self->parked){+.+.}, at: [<ffffffff8109d69d>] kthread_park+0x3d/0x50
> [ 85.069426]
> but task is already holding lock:
> [ 85.069428] (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]
> [ 85.069448]
> which lock already depends on the new lock.
>
> [ 85.069451]
> the existing dependency chain (in reverse order) is:
> [ 85.069454]
> -> #3 (&dev->struct_mutex){+.+.}:
> [ 85.069460] __mutex_lock+0x81/0x9b0
> [ 85.069481] i915_mutex_lock_interruptible+0x47/0x130 [i915]
> [ 85.069502] i915_gem_fault+0x201/0x760 [i915]
> [ 85.069507] __do_fault+0x15/0x70
> [ 85.069509] __handle_mm_fault+0x7bf/0xda0
> [ 85.069512] handle_mm_fault+0x14f/0x2f0
> [ 85.069515] __do_page_fault+0x2d1/0x560
> [ 85.069518] page_fault+0x22/0x30
> [ 85.069520]
> -> #2 (&mm->mmap_sem){++++}:
> [ 85.069525] __might_fault+0x63/0x90
> [ 85.069529] _copy_to_user+0x1e/0x70
> [ 85.069532] perf_read+0x21d/0x290
> [ 85.069534] __vfs_read+0x1e/0x120
> [ 85.069536] vfs_read+0xa1/0x150
> [ 85.069539] SyS_read+0x40/0xa0
> [ 85.069541] entry_SYSCALL_64_fastpath+0x1c/0x89

> -> #0 ((completion)&self->parked){+.+.}:

> [ 85.069692] Chain exists of:
> (completion)&self->parked --> &mm->mmap_sem --> &dev->struct_mutex

> [ 85.069718] 3 locks held by gem_exec_captur/2810:

> [ 85.069732] #2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]

> stack backtrace:

> [ 85.069788] lock_acquire+0xaf/0x200
> [ 85.069793] wait_for_common+0x54/0x210
> [ 85.069807] kthread_park+0x3d/0x50
> [ 85.069827] i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
> [ 85.069849] i915_gem_reset_prepare+0x2c/0x60 [i915]
> [ 85.069865] i915_reset+0x66/0x230 [i915]
> [ 85.069881] i915_reset_device+0x1cb/0x230 [i915]
> [ 85.069919] i915_handle_error+0x2d3/0x430 [i915]
> [ 85.069951] i915_wedged_set+0x79/0xc0 [i915]
> [ 85.069955] simple_attr_write+0xab/0xc0
> [ 85.069959] full_proxy_write+0x4b/0x70
> [ 85.069961] __vfs_write+0x1e/0x130
> [ 85.069976] vfs_write+0xc0/0x1b0
> [ 85.069979] SyS_write+0x40/0xa0
> [ 85.069982] entry_SYSCALL_64_fastpath+0x1c/0x89


Thread-A k-Thread

i915_reset_device
#3 mutex_lock(&dev->struct_mutex)
i915_reset()
i915_gem_reset_prepare()
i915_gem_reset_prepare_engine()
kthread_park()

__do_page_fault()
#2 down_read(&mm->mmap_sem);
handle_mm_fault()
__handle_mm_fault()
__do_fault()
i915_gem_fault()
i915_mutex_lock_interruptible()
#3 mutex_lock(&dev->struct_mutex)

/* twiddles thumbs forever more */

#0 wait_for_common()

#0 complete()


Is what it says I suppose. Now I don't know enough about that i915 code
to say if that breadcrumbs_signal thread can ever trigger a fault or
not. I got properly lost in that dma_fence callback maze.

You're saying not?


(also, that comment near need_resched() doesn't make sense to me)