Re: [PATCH v2 3/5] nvme-fabrics: introduce ref counting for nvmf_ctrl_options

From: Daniel Wagner
Date: Mon Mar 11 2024 - 13:36:15 EST


On Thu, Mar 07, 2024 at 12:27:43PM +0200, Sagi Grimberg wrote:
> Why do we need a refcount for an object that has the same exact lifetime
> as the ctrl itself? It just feels like unneeded complication.

My claim the UAF is also possible with the current code is not correct.
Or at least not easy to reproduce. I've re-tested a lot and I couldn't
reproduce it.

Though, the UAF is very simple to reproduce with the sync connect patch
applied (nvme-fc: wait for initial connect attempt to finish) together
with Hannes' patch (nvme: authentication error are always
non-retryable):

In this case, the initial connect fails and the resources are removed,
while we are waiting in

+ if (!opts->connect_async) {
+ enum nvme_ctrl_state state;
+
+ wait_for_completion(&ctrl->connect_completion);
+ state = nvme_ctrl_state(&ctrl->ctrl);
+ nvme_fc_ctrl_put(ctrl);
+
+ if (state != NVME_CTRL_LIVE) {
+ /* Cleanup is handled by the connect state machine */
+ return ERR_PTR(-EIO);
+ }
+ }

This opens up the race window. While we are waiting here for the
completion, the ctrl entry in sysfs is still reachable. Unfortunately,
we also fire an uevent which starts another instance of nvme-cli. And
the new instance of nvme-cli iterates over sysfs and reads the already
freed options object.

run blktests nvme/041 at 2024-03-11 18:13:38
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:0} Association created
[8167] nvmet: ctrl 1 start keep-alive timer for 5 secs
[8167] nvmet: check nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[8167] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
[8167] nvmet: nvmet_setup_auth: using hash none key fb 28 d3 79 af 04 ba 36 95 3b e5 89 6c bf 42 90 4a dd dd 1b d4 e8 ba ce b2 7c 16 d4 01 7d 4f 20
nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
nvme nvme0: qid 0: no key
nvme nvme0: qid 0: authentication setup failed
nvme nvme0: NVME-FC{0}: create_assoc failed, assoc_id f2139b60a42c0000 ret 16785
nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (16785)
nvme nvme0: NVME-FC{0}: reconnect failure
nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
==================================================================
BUG: KASAN: slab-use-after-free in nvme_class_uevent+0xb9/0x1a0 [nvme_core]
Read of size 8 at addr ffff888107229698 by task systemd-journal/578

CPU: 1 PID: 578 Comm: systemd-journal Tainted: G W 6.8.0-rc6+ #43 106200e85ab1e5c3399a68beb80cc63ca4823f3a
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x80
print_report+0x163/0x800
? __virt_addr_valid+0x2f3/0x340
? nvme_class_uevent+0xb9/0x1a0 [nvme_core a5a8fc3d48e3ec2a76ff6521d70aebe532cfd700]
kasan_report+0xd0/0x110
? nvme_class_uevent+0xb9/0x1a0 [nvme_core a5a8fc3d48e3ec2a76ff6521d70aebe532cfd700]
nvme_class_uevent+0xb9/0x1a0 [nvme_core a5a8fc3d48e3ec2a76ff6521d70aebe532cfd700]
dev_uevent+0x374/0x640
uevent_show+0x187/0x2a0
dev_attr_show+0x5f/0xb0
sysfs_kf_seq_show+0x2a8/0x3f0
? __cfi_dev_attr_show+0x10/0x10
seq_read_iter+0x3f1/0xc00
vfs_read+0x6cf/0x960
ksys_read+0xd7/0x1a0
do_syscall_64+0xb1/0x180
? do_syscall_64+0xc0/0x180
entry_SYSCALL_64_after_hwframe+0x6e/0x76
RIP: 0033:0x7f4297b0a3dc
Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 97 18 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 fd

RSP: 002b:00007ffd945ec430 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000564732197e60 RCX: 00007f4297b0a3dc
RDX: 0000000000001008 RSI: 0000564732197e60 RDI: 000000000000001a
RBP: 000000000000001a R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000001007
R13: 0000000000001008 R14: ffffffffffffffff R15: 0000000000000002
</TASK>

Allocated by task 31249 on cpu 0 at 5508.645525s:
kasan_save_track+0x2c/0x90
__kasan_kmalloc+0x89/0xa0
kmalloc_trace+0x1f3/0x3c0
nvmf_dev_write+0x15c/0x2990 [nvme_fabrics]
vfs_write+0x1cd/0xb60
ksys_write+0xd7/0x1a0
do_syscall_64+0xb1/0x180
entry_SYSCALL_64_after_hwframe+0x6e/0x76

Freed by task 31249 on cpu 2 at 5508.686805s:
kasan_save_track+0x2c/0x90
kasan_save_free_info+0x4a/0x60
poison_slab_object+0x108/0x180
__kasan_slab_free+0x33/0x80
kfree+0x119/0x310
nvmf_dev_write+0x23e0/0x2990 [nvme_fabrics]
vfs_write+0x1cd/0xb60
ksys_write+0xd7/0x1a0
do_syscall_64+0xb1/0x180
entry_SYSCALL_64_after_hwframe+0x6e/0x76

The buggy address belongs to the object at ffff888107229680
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 24 bytes inside of
freed 192-byte region [ffff888107229680, ffff888107229740)

The buggy address belongs to the physical page:
page:0000000070cf556f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x107228
head:0000000070cf556f order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0017ffffc0000840 ffff888100042c00 ffffea0004363500 dead000000000004
raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888107229580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
ffff888107229600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888107229680: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888107229700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888107229780: fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb fb
==================================================================