Re: [PATCH] fs/pstore: Perform erase from a worker

From: Chris Wilson
Date: Tue Mar 21 2017 - 13:19:41 EST


On Tue, Mar 21, 2017 at 02:58:48PM +0900, Namhyung Kim wrote:
> Hello,
>
> On Mon, Mar 20, 2017 at 10:49:16AM -0700, Kees Cook wrote:
> > On Fri, Mar 17, 2017 at 2:52 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
> > > In order to prevent a cyclic recursion between psi->read_mutex and the
> > > inode_lock, we need to move the pse->erase to a worker.
> > >
> > > [ 605.374955] ======================================================
> > > [ 605.381281] [ INFO: possible circular locking dependency detected ]
> > > [ 605.387679] 4.11.0-rc2-CI-CI_DRM_2352+ #1 Not tainted
> > > [ 605.392826] -------------------------------------------------------
> > > [ 605.399196] rm/7298 is trying to acquire lock:
> > > [ 605.403720] (&psinfo->read_mutex){+.+.+.}, at: [<ffffffff813e183f>] pstore_unlink+0x3f/0xa0
> > > [ 605.412300]
> > > [ 605.412300] but task is already holding lock:
> > > [ 605.418237] (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff812157ec>] vfs_unlink+0x4c/0x19
> > > 0
> > > [ 605.427397]
> > > [ 605.427397] which lock already depends on the new lock.
> > > [ 605.427397]
> > > [ 605.435770]
> > > [ 605.435770] the existing dependency chain (in reverse order) is:
> > > [ 605.443396]
> > > [ 605.443396] -> #1 (&sb->s_type->i_mutex_key#14){++++++}:
> > > [ 605.450347] lock_acquire+0xc9/0x220
> > > [ 605.454551] down_write+0x3f/0x70
> > > [ 605.458484] pstore_mkfile+0x1f4/0x460
> > > [ 605.462835] pstore_get_records+0x17a/0x320
> > > [ 605.467664] pstore_fill_super+0xa4/0xc0
> > > [ 605.472205] mount_single+0x89/0xb0
> > > [ 605.476314] pstore_mount+0x13/0x20
> > > [ 605.480411] mount_fs+0xf/0x90
> > > [ 605.484122] vfs_kern_mount+0x66/0x170
> > > [ 605.488464] do_mount+0x190/0xd50
> > > [ 605.492397] SyS_mount+0x90/0xd0
> > > [ 605.496212] entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > [ 605.501496]
> > > [ 605.501496] -> #0 (&psinfo->read_mutex){+.+.+.}:
> > > [ 605.507747] __lock_acquire+0x1ac0/0x1bb0
> > > [ 605.512401] lock_acquire+0xc9/0x220
> > > [ 605.516594] __mutex_lock+0x6e/0x990
> > > [ 605.520755] mutex_lock_nested+0x16/0x20
> > > [ 605.525279] pstore_unlink+0x3f/0xa0
> > > [ 605.529465] vfs_unlink+0xb5/0x190
> > > [ 605.533477] do_unlinkat+0x24c/0x2a0
> > > [ 605.537672] SyS_unlinkat+0x16/0x30
> > > [ 605.541781] entry_SYSCALL_64_fastpath+0x1c/0xb1
> >
> > If I'm reading this right it's a race between mount and unlink...
> > that's quite a corner case. :)
> >
> > > [ 605.547067]
> > > [ 605.547067] other info that might help us debug this:
> > > [ 605.547067]
> > > [ 605.555221] Possible unsafe locking scenario:
> > > [ 605.555221]
> > > [ 605.561280] CPU0 CPU1
> > > [ 605.565883] ---- ----
> > > [ 605.570502] lock(&sb->s_type->i_mutex_key#14);
> > > [ 605.575217] lock(&psinfo->read_mutex);
> > > [ 605.581803] lock(&sb->s_type->i_mutex_key#14);
> > > [ 605.589159] lock(&psinfo->read_mutex);
> >
> > I haven't had time to dig much yet, but I wonder if the locking order
> > on unlink could just be reversed, and the deadlock would go away?
>
> IIUC, the unlink path locks a file in the root directory, while the
> mount path locks the root directory. Maybe we can use a subclass?
> (not tested)

More puzzling, or just my confusion, reports from our CI farm say that
this patch breaks removing objects from pstote. :|

Will look forward to better suggestions on how to avoid lockdep.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre