[PATCH RFC 0/1] mount: universally disallow mounting over symlinks

From: Aleksa Sarai
Date: Mon Dec 30 2019 - 00:21:35 EST


An undocumented feature of the mount interface was that it was possible
to mount over a symlink (even with the old mount API) by mounting over
/proc/self/fd/$n -- where the corresponding file descrpitor was opened
with (O_PATH|O_NOFOLLOW). This didn't work with traditional "new" mounts
(for a variety of reasons), but MS_BIND worked without issue. With the
new mount API it was even easier.

A reasonably detailed explanation of the issues is provided in the patch
itself, but the full traces produced by both the oopses and deadlocks is
included below (it makes little sense to include them in the commit since we
are disabling this feature, not directly fixing the bugs themselves).

I've posted this as an RFC on whether this feature should be allowed at
all (and if anyone knows of legitimate uses for it), or if we should
work on fixing these other kernel bugs that it exposes.

Oops on NULL dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 8000000181b1f067 P4D 8000000181b1f067 PUD 24829c067 PMD 0
Oops: 0010 [#1] SMP PTI
CPU: 6 PID: 20796 Comm: mount_to_symlin Tainted: G OE 5.5.0-rc1+openat2~v18+ #123
Hardware name: LENOVO 20KHCTO1WW/20KHCTO1WW, BIOS N23ET55W (1.30 ) 08/31/2018
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:ffffbc7d87e1bcb0 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffffa0c28cb633c0 RCX: 000000000000ae5a
RDX: 0000000000000089 RSI: ffffa0c0eece8840 RDI: ffffa0c0eb8843b0
RBP: ffffa0c0eb8843b0 R08: ffffdc7d7fbbb770 R09: ffffa0c0ca333000
R10: 0000000000000000 R11: 808080807fffffff R12: ffffa0c0eece8840
R13: 0000000000000089 R14: ffffbc7d87e1bdb0 R15: 0000000000000080
FS: 00007fd921508540(0000) GS:ffffa0c3cf580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000018878a003 CR4: 00000000003606e0
Call Trace:
__lookup_slow+0x94/0x160
lookup_slow+0x36/0x50
path_mountpoint+0x1be/0x350
filename_mountpoint+0xa5/0x150
? __lookup_hash+0xa0/0xa0
ksys_umount+0x78/0x490
__x64_sys_umount+0x12/0x20
do_syscall_64+0x64/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fd92143f4e7
Code: 09 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09
00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 69 09 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe98c89cc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd92143f4e7
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000000167a330
RBP: 00007ffe98c89da0 R08: 0000000000000000 R09: 000000000000000f
R10: 00000000004004c6 R11: 0000000000000202 R12: 00000000004010c0
R13: 00007ffe98c89e80 R14: 0000000000000000 R15: 0000000000000000
CR2: 0000000000000000

Oops on kernel address:
BUG: unable to handle page fault for address: ffffbc7d87e1bcc0
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 107d4a067 P4D 107d4a067 PUD 107d4b067 PMD 46d753067 PTE 0
Oops: 0002 [#2] SMP PTI
CPU: 4 PID: 20975 Comm: mount_to_symlin Tainted: G D OE 5.5.0-rc1+openat2~v18+ #123
Hardware name: LENOVO 20KHCTO1WW/20KHCTO1WW, BIOS N23ET55W (1.30 ) 08/31/2018
RIP: 0010:_raw_spin_lock_irqsave+0x28/0x50
Code: 00 00 0f 1f 44 00 00 41 54 53 48 89 fb 9c 58 0f 1f 44 00 00 49 89 c4
fa 66 0f 1f 44 00 00 e8 3f 55 82 ff 31 c0 ba 01 00 00 00 <f0> 0f b1
13 75 07 4c 89 e0 5b 41 5c c3 89 c6 48 89 df e8 01 52 77
RSP: 0018:ffffbc7d90067bd8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffbc7d87e1bcc0 RCX: 0000000200000000
RDX: 0000000000000001 RSI: ffffbc7d90067c50 RDI: ffffbc7d87e1bcc0
RBP: ffffbc7d87e1bcc0 R08: 0000000000000001 R09: 0000000000000003
R10: 0000000000000000 R11: 808080807fffffff R12: 0000000000000246
R13: ffffa0c28cb633c0 R14: ffffbc7d90067db0 R15: ffffa0c0eece8898
FS: 00007f4b80214540(0000) GS:ffffa0c3cf500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffbc7d87e1bcc0 CR3: 000000026d4d0002 CR4: 00000000003606e0
Call Trace:
add_wait_queue+0x15/0x40
d_alloc_parallel+0x36d/0x480
? get_acl+0x1a/0x160
? wake_up_q+0xa0/0xa0
__lookup_slow+0x6b/0x160
lookup_slow+0x36/0x50
path_mountpoint+0x1be/0x350
filename_mountpoint+0xa5/0x150
? __lookup_hash+0xa0/0xa0
ksys_umount+0x78/0x490
__x64_sys_umount+0x12/0x20
do_syscall_64+0x64/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f4b8014b4e7
Code: 09 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09
00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 69 09 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffee8041b28 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4b8014b4e7
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00000000019c8330
RBP: 00007ffee8041c00 R08: 0000000000000000 R09: 000000000000000f
R10: 00000000004004c6 R11: 0000000000000206 R12: 00000000004010c0
R13: 00007ffee8041ce0 R14: 0000000000000000 R15: 0000000000000000
CR2: ffffbc7d87e1bcc0

Apparent deadlock in d_alloc_parallel:
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [mount_to_symlin:21285]
CPU: 0 PID: 21285 Comm: mount_to_symlin Tainted: G D OE 5.5.0-rc1+openat2~v18+ #123
Hardware name: LENOVO 20KHCTO1WW/20KHCTO1WW, BIOS N23ET55W (1.30 ) 08/31/2018
RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1d0
Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0
a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84
c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
RSP: 0018:ffffbc7d90547be8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000101 RBX: ffffffffbac7ac60 RCX: 0000000000000018
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0c0eece8898
RBP: ffffa0c0eece8898 R08: 00000000006f6f66 R09: 0000000000000003
R10: 0000000000000000 R11: 808080807fffffff R12: 00000000e25b3c73
R13: ffffa0c28cb633c0 R14: ffffbc7d90547db0 R15: ffffa0c0eece8898
FS: 00007fbb1fd30540(0000) GS:ffffa0c3cf400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbb1fbd25a0 CR3: 0000000181ace005 CR4: 00000000003606f0
Call Trace:
_raw_spin_lock+0x1a/0x20
lockref_get_not_dead+0x4f/0x90
d_alloc_parallel+0x1a8/0x480
? get_acl+0x1a/0x160
__lookup_slow+0x6b/0x160
lookup_slow+0x36/0x50
path_mountpoint+0x1be/0x350
filename_mountpoint+0xa5/0x150
? __lookup_hash+0xa0/0xa0
ksys_umount+0x78/0x490
__x64_sys_umount+0x12/0x20
do_syscall_64+0x64/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fbb1fc674e7
Code: 09 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09
00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 69 09 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd75fcb858 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbb1fc674e7
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000f6c330
RBP: 00007ffd75fcb930 R08: 0000000000000000 R09: 000000000000000f
R10: 00000000004004a6 R11: 0000000000000202 R12: 00000000004010b0
R13: 00007ffd75fcba10 R14: 0000000000000000 R15: 0000000000000000

RCU stall when trying to grab /proc/$pid/stack for the stuck process:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 0-....: (15000 ticks this GP) idle=2c6/1/0x4000000000000002 softirq=1172554/1172554 fqs=6849
(t=15001 jiffies g=1935177 q=25734)
NMI backtrace for cpu 0
CPU: 0 PID: 21285 Comm: mount_to_symlin Tainted: G D OEL 5.5.0-rc1+openat2~v18+ #123
Hardware name: LENOVO 20KHCTO1WW/20KHCTO1WW, BIOS N23ET55W (1.30 ) 08/31/2018
Call Trace:
<IRQ>
dump_stack+0x8f/0xd0
? lapic_can_unplug_cpu.cold+0x3e/0x3e
nmi_cpu_backtrace.cold+0x14/0x52
nmi_trigger_cpumask_backtrace+0xf6/0xf8
rcu_dump_cpu_stacks+0x8f/0xbd
rcu_sched_clock_irq.cold+0x1b2/0x39f
update_process_times+0x24/0x50
tick_sched_handle+0x22/0x60
tick_sched_timer+0x38/0x80
? tick_sched_do_timer+0x60/0x60
__hrtimer_run_queues+0xf6/0x270
hrtimer_interrupt+0x10e/0x240
smp_apic_timer_interrupt+0x6c/0x130
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1d0
Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0
a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0
75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
RSP: 0018:ffffbc7d90547be8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000101 RBX: ffffffffbac7ac60 RCX: 0000000000000018
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0c0eece8898
RBP: ffffa0c0eece8898 R08: 00000000006f6f66 R09: 0000000000000003
R10: 0000000000000000 R11: 808080807fffffff R12: 00000000e25b3c73
R13: ffffa0c28cb633c0 R14: ffffbc7d90547db0 R15: ffffa0c0eece8898
_raw_spin_lock+0x1a/0x20
lockref_get_not_dead+0x4f/0x90
d_alloc_parallel+0x1a8/0x480
? get_acl+0x1a/0x160
__lookup_slow+0x6b/0x160
lookup_slow+0x36/0x50
path_mountpoint+0x1be/0x350
filename_mountpoint+0xa5/0x150
? __lookup_hash+0xa0/0xa0
ksys_umount+0x78/0x490
__x64_sys_umount+0x12/0x20
do_syscall_64+0x64/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fbb1fc674e7
Code: 09 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09
00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 69 09 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd75fcb858 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbb1fc674e7
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000f6c330
RBP: 00007ffd75fcb930 R08: 0000000000000000 R09: 000000000000000f
R10: 00000000004004a6 R11: 0000000000000202 R12: 00000000004010b0
R13: 00007ffd75fcba10 R14: 0000000000000000 R15: 0000000000000000

Deadlock on lock_mount after a successful umount(). The watchdog does trigger,
but I could only find this stall when trying to suspend the system in my logs:
Freezing of tasks failed after 20.010 seconds (2 tasks refusing to freeze, wq_busy=0):
mount_to_symlin D 0 5850 5849 0x00000004
Call Trace:
? __schedule+0x2dd/0x770
schedule+0x4a/0xb0
rwsem_down_write_slowpath+0x256/0x500
lock_mount+0x22/0xf0
do_mount+0x4b7/0x9f0
ksys_mount+0x7e/0xc0
__x64_sys_mount+0x21/0x30
do_syscall_64+0x64/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f86e6355fda
Code: Bad RIP value.
RSP: 002b:00007ffc36f952d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f86e6355fda
RDX: 0000000000402099 RSI: 00000000019a5310 RDI: 00007ffc36f96ee1
RBP: 00007ffc36f953b0 R08: 0000000000402099 R09: 000000000000000f
R10: 0000000000001000 R11: 0000000000000206 R12: 00000000004010c0
R13: 00007ffc36f95490 R14: 0000000000000000 R15: 0000000000000000

Cc: stable@xxxxxxxxxxxxxxx # pre-git
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: David Howells <dhowells@xxxxxxxxxx>
Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>

Aleksa Sarai (1):
mount: universally disallow mounting over symlinks

fs/namespace.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)


base-commit: fd6988496e79a6a4bdb514a4655d2920209eb85d
--
2.24.1