Re: [PATCH 2/5] blktrace: fix debugfs use after free

From: Greg KH
Date: Tue Apr 14 2020 - 03:37:32 EST


On Tue, Apr 14, 2020 at 04:18:59AM +0000, Luis Chamberlain wrote:
> On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
> merged on v4.12 Omar fixed the original blktrace code for request-based
> drivers (multiqueue). This however left in place a possible crash, if you
> happen to abuse blktrace in a way it was not intended.
>
> Namely, if you loop adding a device, setup the blktrace with BLKTRACESETUP,
> forget to BLKTRACETEARDOWN, and then just remove the device you end up
> with a panic:
>
> [ 107.193134] debugfs: Directory 'loop0' with parent 'block' already present!
> [ 107.254615] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [ 107.258785] #PF: supervisor write access in kernel mode
> [ 107.262035] #PF: error_code(0x0002) - not-present page
> [ 107.264106] PGD 0 P4D 0
> [ 107.264404] Oops: 0002 [#1] SMP NOPTI
> [ 107.264803] CPU: 8 PID: 674 Comm: kworker/8:2 Tainted: G E 5.6.0-rc7-next-20200327 #1
> [ 107.265712] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> [ 107.266553] Workqueue: events __blk_release_queue
> [ 107.267051] RIP: 0010:down_write+0x15/0x40
> [ 107.267488] Code: eb ca e8 ee a5 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d
> [ 107.269300] RSP: 0018:ffff9927c06efda8 EFLAGS: 00010246
> [ 107.269841] RAX: 0000000000000000 RBX: ffff8be7e73b0600 RCX: ffffff8100000000
> [ 107.270559] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
> [ 107.271281] RBP: 00000000000000a0 R08: ffff8be7ebc80fa8 R09: ffff8be7ebc80fa8
> [ 107.272001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 107.272722] R13: ffff8be7efc30400 R14: ffff8be7e0571200 R15: 00000000000000a0
> [ 107.273475] FS: 0000000000000000(0000) GS:ffff8be7efc00000(0000) knlGS:0000000000000000
> [ 107.274346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 107.274968] CR2: 00000000000000a0 CR3: 000000042abee003 CR4: 0000000000360ee0
> [ 107.275710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 107.276465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 107.277214] Call Trace:
> [ 107.277532] simple_recursive_removal+0x4e/0x2e0
> [ 107.278049] ? debugfs_remove+0x60/0x60
> [ 107.278493] debugfs_remove+0x40/0x60
> [ 107.278922] blk_trace_free+0xd/0x50
> [ 107.279339] __blk_trace_remove+0x27/0x40
> [ 107.279797] blk_trace_shutdown+0x30/0x40
> [ 107.280256] __blk_release_queue+0xab/0x110
> [ 107.280734] process_one_work+0x1b4/0x380
> [ 107.281194] worker_thread+0x50/0x3c0
> [ 107.281622] kthread+0xf9/0x130
> [ 107.281994] ? process_one_work+0x380/0x380
> [ 107.282467] ? kthread_park+0x90/0x90
> [ 107.282895] ret_from_fork+0x1f/0x40
> [ 107.283316] Modules linked in: loop(E) <etc>
> [ 107.288562] CR2: 00000000000000a0
> [ 107.288957] ---[ end trace b885d243d441bbce ]---
>
> This splat happens to be very similar to the one reported via
> kernel.org korg#205713, only that korg#205713 was for v4.19.83
> and the above now includes the simple_recursive_removal() introduced
> via commit a3d1e7eb5abe ("simple_recursive_removal(): kernel-side rm
> -rf for ramfs-style filesystems") merged on v5.6.
>
> korg#205713 then was used to create CVE-2019-19770 and claims that
> the bug is in a use-after-free in the debugfs core code. The
> implications of this being a generic UAF on debugfs would be
> much more severe, as it would imply parent dentries can sometimes
> not be positive, which we hold by design is just not possible.
>
> Below is the splat explained with a bit more details, explaining
> what is happening in userspace, kernel, and a print of the CPU on,
> which the code runs on:
>
> load loopback module
> [ 13.603371] == blk_mq_debugfs_register(12) start
> [ 13.604040] == blk_mq_debugfs_register(12) q->debugfs_dir created
> [ 13.604934] == blk_mq_debugfs_register(12) end
> [ 13.627382] == blk_mq_debugfs_register(12) start
> [ 13.628041] == blk_mq_debugfs_register(12) q->debugfs_dir created
> [ 13.629240] == blk_mq_debugfs_register(12) end
> [ 13.651667] == blk_mq_debugfs_register(12) start
> [ 13.652836] == blk_mq_debugfs_register(12) q->debugfs_dir created
> [ 13.655107] == blk_mq_debugfs_register(12) end
> [ 13.684917] == blk_mq_debugfs_register(12) start
> [ 13.687876] == blk_mq_debugfs_register(12) q->debugfs_dir created
> [ 13.691588] == blk_mq_debugfs_register(13) end
> [ 13.707320] == blk_mq_debugfs_register(13) start
> [ 13.707863] == blk_mq_debugfs_register(13) q->debugfs_dir created
> [ 13.708856] == blk_mq_debugfs_register(13) end
> [ 13.735623] == blk_mq_debugfs_register(13) start
> [ 13.736656] == blk_mq_debugfs_register(13) q->debugfs_dir created
> [ 13.738411] == blk_mq_debugfs_register(13) end
> [ 13.763326] == blk_mq_debugfs_register(13) start
> [ 13.763972] == blk_mq_debugfs_register(13) q->debugfs_dir created
> [ 13.765167] == blk_mq_debugfs_register(13) end
> [ 13.779510] == blk_mq_debugfs_register(13) start
> [ 13.780522] == blk_mq_debugfs_register(13) q->debugfs_dir created
> [ 13.782338] == blk_mq_debugfs_register(13) end
> [ 13.783521] loop: module loaded
>
> LOOP_CTL_DEL(loop0) #1
> [ 13.803550] = __blk_release_queue(4) start
> [ 13.807772] == blk_trace_shutdown(4) start
> [ 13.810749] == blk_trace_shutdown(4) end
> [ 13.813437] = __blk_release_queue(4) calling blk_mq_debugfs_unregister()
> [ 13.817593] ==== blk_mq_debugfs_unregister(4) begin
> [ 13.817621] ==== blk_mq_debugfs_unregister(4) debugfs_remove_recursive(q->debugfs_dir)
> [ 13.821203] ==== blk_mq_debugfs_unregister(4) end q->debugfs_dir is NULL
> [ 13.826166] = __blk_release_queue(4) blk_mq_debugfs_unregister() end
> [ 13.832992] = __blk_release_queue(4) end
>
> LOOP_CTL_ADD(loop0) #1
> [ 13.843742] == blk_mq_debugfs_register(7) start
> [ 13.845569] == blk_mq_debugfs_register(7) q->debugfs_dir created
> [ 13.848628] == blk_mq_debugfs_register(7) end
>
> BLKTRACE_SETUP(loop0) #1
> [ 13.850924] == blk_trace_ioctl(7, BLKTRACESETUP) start
> [ 13.852852] === do_blk_trace_setup(7) start
> [ 13.854580] === do_blk_trace_setup(7) creating directory
> [ 13.856620] === do_blk_trace_setup(7) using what debugfs_lookup() gave
> [ 13.860635] === do_blk_trace_setup(7) end with ret: 0
> [ 13.862615] == blk_trace_ioctl(7, BLKTRACESETUP) end
>
> LOOP_CTL_DEL(loop0) #2
> [ 13.883304] = __blk_release_queue(7) start
> [ 13.885324] == blk_trace_shutdown(7) start
> [ 13.887197] == blk_trace_shutdown(7) calling __blk_trace_remove()
> [ 13.889807] == __blk_trace_remove(7) start
> [ 13.891669] === blk_trace_cleanup(7) start
> [ 13.911656] ====== blk_trace_free(7) start
>
> LOOP_CTL_ADD(loop0) #2
> [ 13.912709] == blk_mq_debugfs_register(2) start
>
> ---> From LOOP_CTL_DEL(loop0) #2
> [ 13.915887] ====== blk_trace_free(7) end
>
> ---> From LOOP_CTL_ADD(loop0) #2
> [ 13.918359] debugfs: Directory 'loop0' with parent 'block' already present!
> [ 13.926433] == blk_mq_debugfs_register(2) q->debugfs_dir created
> [ 13.930373] == blk_mq_debugfs_register(2) end
>
> BLKTRACE_SETUP(loop0) #2
> [ 13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
> [ 13.936758] === do_blk_trace_setup(2) start
> [ 13.938944] === do_blk_trace_setup(2) creating directory
> [ 13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave
>
> ---> From LOOP_CTL_DEL(loop0) #2
> [ 13.971046] === blk_trace_cleanup(7) end
> [ 13.973175] == __blk_trace_remove(7) end
> [ 13.975352] == blk_trace_shutdown(7) end
> [ 13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
> [ 13.980645] ==== blk_mq_debugfs_unregister(7) begin
> [ 13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
> [ 13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
> [ 13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
> [ 13.993155] = __blk_release_queue(7) end
>
> ---> From BLKTRACE_SETUP(loop0) #2
> [ 13.995928] === do_blk_trace_setup(2) end with ret: 0
> [ 13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end
>
> LOOP_CTL_DEL(loop0) #3
> [ 14.035119] = __blk_release_queue(2) start
> [ 14.036925] == blk_trace_shutdown(2) start
> [ 14.038518] == blk_trace_shutdown(2) calling __blk_trace_remove()
> [ 14.040829] == __blk_trace_remove(2) start
> [ 14.042413] === blk_trace_cleanup(2) start
>
> LOOP_CTL_ADD(loop0) #3
> [ 14.072522] == blk_mq_debugfs_register(6) start
>
> ---> From LOOP_CTL_DEL(loop0) #3
> [ 14.075151] ====== blk_trace_free(2) start
>
> ---> From LOOP_CTL_ADD(loop0) #3
> [ 14.075882] == blk_mq_debugfs_register(6) q->debugfs_dir created
>
> ---> From LOOP_CTL_DEL(loop0) #3
> [ 14.078624] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [ 14.084332] == blk_mq_debugfs_register(6) end
> [ 14.086971] #PF: supervisor write access in kernel mode
> [ 14.086974] #PF: error_code(0x0002) - not-present page
> [ 14.086977] PGD 0 P4D 0
> [ 14.086984] Oops: 0002 [#1] SMP NOPTI
> [ 14.086990] CPU: 2 PID: 287 Comm: kworker/2:2 Tainted: G E 5.6.0-next-20200403+ #54
> [ 14.086991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> [ 14.087002] Workqueue: events __blk_release_queue
> [ 14.087011] RIP: 0010:down_write+0x15/0x40
> [ 14.090300] == blk_trace_ioctl(6, BLKTRACESETUP) start
> [ 14.093277] Code: eb ca e8 3e 34 8d ff cc cc cc cc cc cc cc cc cc cc
> cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00
> 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d
> [ 14.093280] RSP: 0018:ffffc28a00533da8 EFLAGS: 00010246
> [ 14.093284] RAX: 0000000000000000 RBX: ffff9f7a24d07980 RCX: ffffff8100000000
> [ 14.093286] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
> [ 14.093287] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000019
> [ 14.093289] R10: 0000000000000774 R11: 0000000000000000 R12: 0000000000000000
> [ 14.093291] R13: ffff9f7a2fab0400 R14: ffff9f7a21dd1140 R15: 00000000000000a0
> [ 14.093294] FS: 0000000000000000(0000) GS:ffff9f7a2fa80000(0000) knlGS:0000000000000000
> [ 14.093296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 14.093298] CR2: 00000000000000a0 CR3: 00000004293d2003 CR4: 0000000000360ee0
> [ 14.093307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 14.093308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 14.093310] Call Trace:
> [ 14.093324] simple_recursive_removal+0x4e/0x2e0
> [ 14.093330] ? debugfs_remove+0x60/0x60
> [ 14.093334] debugfs_remove+0x40/0x60
> [ 14.093339] blk_trace_free+0x20/0x70
> [ 14.093346] __blk_trace_remove+0x54/0x90
> [ 14.096704] === do_blk_trace_setup(6) start
> [ 14.098534] blk_trace_shutdown+0x74/0x80
> [ 14.100958] === do_blk_trace_setup(6) creating directory
> [ 14.104575] __blk_release_queue+0xbe/0x160
> [ 14.104580] process_one_work+0x1b4/0x380
> [ 14.104585] worker_thread+0x50/0x3c0
> [ 14.104589] kthread+0xf9/0x130
> [ 14.104593] ? process_one_work+0x380/0x380
> [ 14.104596] ? kthread_park+0x90/0x90
> [ 14.104599] ret_from_fork+0x1f/0x40
> [ 14.104603] Modules linked in: loop(E) xfs(E) libcrc32c(E)
> crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) joydev(E)
> serio_raw(E) aesni_intel(E) glue_helper(E) virtio_balloon(E) evdev(E)
> crypto_simd(E) pcspkr(E) cryptd(E) i6300esb(E) button(E) ip_tables(E)
> x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E)
> jbd2(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E)
> ata_generic(E) uhci_hcd(E) ata_piix(E) ehci_hcd(E) nvme(E) libata(E)
> crc32c_intel(E) usbcore(E) psmouse(E) nvme_core(E) virtio_pci(E)
> scsi_mod(E) virtio_ring(E) t10_pi(E) virtio(E) i2c_piix4(E) floppy(E)
> [ 14.107400] === do_blk_trace_setup(6) using what debugfs_lookup() gave
> [ 14.108939] CR2: 00000000000000a0
> [ 14.110589] === do_blk_trace_setup(6) end with ret: 0
> [ 14.111592] ---[ end trace 7a783b33b9614db9 ]---
>
> The root cause to this issue is that debugfs_lookup() can find a
> previous incarnation's dir of the same name which is about to get
> removed from a not yet schedule work.
>
> We can fix the UAF by simply using a debugfs directory which moving
> forward will always be accessible if debugfs is enabled, this way,
> its allocated and avaialble always for both request-based block
> drivers or make_request drivers (multiqueue) block drivers.
>
> This simplifies the code considerably, with the only penalty now being
> that we're always creating the request queue debugfs directory for the
> request-based block device drivers.
>
> The UAF then is not a core debugfs issue, but instead a misuse of
> debugfs, and this issue can only be triggered if you are root, and
> misuse blktrace.
>
> This issue can be reproduced with break-blktrace [2] using:
>
> break-blktrace -c 10 -d -s
>
> This patch fixes this issue. Note that there is also another
> respective UAF but from the ioctl path [3], this should also fix
> that issue.
>
> This patch then also disputes the severity of CVE-2019-19770 as
> this issue is only possible by being root and using blktrace.
>
> It is not a core debugfs issue.
>
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
> [1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
> [2] https://github.com/mcgrof/break-blktrace
> [3] https://lore.kernel.org/lkml/000000000000ec635b059f752700@xxxxxxxxxx/
>
> Cc: Bart Van Assche <bvanassche@xxxxxxx>
> Cc: Omar Sandoval <osandov@xxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxxx>
> Cc: Nicolai Stange <nstange@xxxxxxx>
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxxxx>
> Cc: yu kuai <yukuai3@xxxxxxxxxx>
> Reported-by: syzbot+603294af2d01acfdd6da@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>

Reviewed-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>