Re: possible deadlock in ovl_write_iter

From: Amir Goldstein
Date: Wed Sep 26 2018 - 23:48:29 EST


On Wed, Sep 26, 2018 at 11:44 PM syzbot
<syzbot+695726bc473f9c36a4b6@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: a38523185b40 erge tag 'libnvdimm-fixes-4.19-rc6' of git://..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=178f63fa400000
> kernel config: https://syzkaller.appspot.com/x/.config?x=dfb440e26f0a6f6f
> dashboard link: https://syzkaller.appspot.com/bug?extid=695726bc473f9c36a4b6
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+695726bc473f9c36a4b6@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> Process accounting resumed
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.19.0-rc5+ #32 Not tainted
> ------------------------------------------------------
> overlayfs: option "workdir=./file1\" is useless in a non-upper mount, ignore
> syz-executor1/7265 is trying to acquire lock:
> 00000000fec87ddb (&ovl_i_mutex_key[depth]){+.+.}, at: inode_lock
> include/linux/fs.h:738 [inline]
> 00000000fec87ddb (&ovl_i_mutex_key[depth]){+.+.}, at:
> ovl_write_iter+0x151/0xb00 fs/overlayfs/file.c:231
>
> but task is already holding lock:
> 00000000998db2f0 (&acct->lock#2){+.+.}, at: acct_get kernel/acct.c:161
> [inline]
> 00000000998db2f0 (&acct->lock#2){+.+.}, at: slow_acct_process
> kernel/acct.c:577 [inline]
> 00000000998db2f0 (&acct->lock#2){+.+.}, at: acct_process+0x48b/0x875
> kernel/acct.c:605
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&acct->lock#2){+.+.}:
> overlayfs: at least 2 lowerdir are needed while upperdir nonexistent
> __mutex_lock_common kernel/locking/mutex.c:925 [inline]
> __mutex_lock+0x166/0x1700 kernel/locking/mutex.c:1072
> kobject: 'kvm' (00000000bb4f2ec2): kobject_uevent_env
> mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
> acct_pin_kill+0x26/0x100 kernel/acct.c:173
> pin_kill+0x29d/0xab0 fs/fs_pin.c:50
> kobject: 'kvm' (00000000bb4f2ec2): fill_kobj_path: path
> = '/devices/virtual/misc/kvm'
> acct_on+0x64a/0x8c0 kernel/acct.c:254
> __do_sys_acct kernel/acct.c:286 [inline]
> __se_sys_acct kernel/acct.c:273 [inline]
> __x64_sys_acct+0xc2/0x1f0 kernel/acct.c:273
> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (sb_writers#3){.+.+}:
> percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36
> [inline]
> percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
> __sb_start_write+0x214/0x370 fs/super.c:1387
> sb_start_write include/linux/fs.h:1566 [inline]
> mnt_want_write+0x3f/0xc0 fs/namespace.c:360
> ovl_want_write+0x76/0xa0 fs/overlayfs/util.c:24
> ovl_setattr+0x10b/0xaf0 fs/overlayfs/inode.c:30
> notify_change+0xbde/0x1110 fs/attr.c:334
> do_truncate+0x1bd/0x2d0 fs/open.c:63
> handle_truncate fs/namei.c:3008 [inline]
> do_last fs/namei.c:3424 [inline]
> path_openat+0x3762/0x5160 fs/namei.c:3534
> do_filp_open+0x255/0x380 fs/namei.c:3564
> kobject: 'loop4' (0000000088f052bf): kobject_uevent_env
> do_sys_open+0x568/0x700 fs/open.c:1063
> ksys_open include/linux/syscalls.h:1276 [inline]
> __do_sys_creat fs/open.c:1121 [inline]
> __se_sys_creat fs/open.c:1119 [inline]
> __x64_sys_creat+0x61/0x80 fs/open.c:1119
> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> kobject: 'loop4' (0000000088f052bf): fill_kobj_path: path
> = '/devices/virtual/block/loop4'
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0
> kobject: 'kvm' (00000000bb4f2ec2): kobject_uevent_env
> (&ovl_i_mutex_key[depth]){+.+.}:
> lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3900
> down_write+0x8a/0x130 kernel/locking/rwsem.c:70
> inode_lock include/linux/fs.h:738 [inline]
> ovl_write_iter+0x151/0xb00 fs/overlayfs/file.c:231
> kobject: 'kvm' (00000000bb4f2ec2): fill_kobj_path: path
> = '/devices/virtual/misc/kvm'
> call_write_iter include/linux/fs.h:1808 [inline]
> new_sync_write fs/read_write.c:474 [inline]
> __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
> __kernel_write+0x10c/0x370 fs/read_write.c:506
> do_acct_process+0x1144/0x1660 kernel/acct.c:520
> slow_acct_process kernel/acct.c:579 [inline]
> acct_process+0x6b1/0x875 kernel/acct.c:605
> do_exit+0x1aaf/0x2610 kernel/exit.c:857
> do_group_exit+0x177/0x440 kernel/exit.c:970
> get_signal+0x8b0/0x1980 kernel/signal.c:2513
> do_signal+0x9c/0x21e0 arch/x86/kernel/signal.c:816
> exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
> prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
> syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
> do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
> &ovl_i_mutex_key[depth] --> sb_writers#3 --> &acct->lock#2
>

I am not sure this is the root cause, but there seems to be a locking order
violation in code that replaces accounting file.
If new file is on the same fs as old file, acct_pin_kill(old) will skip writing
the file, because sb_writers is still already taken by acct_on().
If new file is not on same fs as old file, this ordering violation creates
an unneeded dependency between new_sb_writers and old_sb_writers,
which may be later reported as possible deadlock.

This could result in an actual deadlock if accounting file is replaced
from file in overlayfs over "real fs" to file in "real fs".
acct_on() takes freeze protection on "real fs" and tries to write to
overlayfs file. overlayfs is not freeze protected so do_acct_process()
can carry on with __kernel_write() to overlayfs file, which SHOULD
take "real fs" freeze protection and deadlock.
Ironically (or not) it doesn't deadlock because of a bug fixed with
898cc19d8af2 ovl: fix freeze protection bypass in ovl_write_iter()
which is not upstream yet, so wasn't in the kernel that syzbot tested.

Following untested patch should fix the alleged deadlock.

Miklos,

If you feel confident about this patch I can re-post it for you to add
to next. Otherwise, as I didn't see Al around to comment on the patch,
I will try to reproduce the deadlock.

Thanks,
Amir.
---

diff --git a/kernel/acct.c b/kernel/acct.c
index addf7732fb56..c09355a7ae46 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -251,8 +251,8 @@ static int acct_on(struct filename *pathname)
rcu_read_lock();
old = xchg(&ns->bacct, &acct->pin);
mutex_unlock(&acct->lock);
- pin_kill(old);
mnt_drop_write(mnt);
+ pin_kill(old);
mntput(mnt);
return 0;
}