Re: [PATCH v1 1/3] blk-mq: don't complete un-started request in timeout handler

From: Hannes Reinecke
Date: Sat Mar 18 2017 - 07:25:34 EST


On 03/17/2017 10:57 AM, Ming Lei wrote:
> When iterating busy requests in timeout handler,
> if the STARTED flag of one request isn't set, that means
> the request is being processed in block layer or driver, and
> isn't dispatch to hardware yet.
>
> In current implementation of blk_mq_check_expired(),
> if the request queue becomes dying, un-started requests are
> handled as being completed/freed immediately. This way is
> wrong, and can cause use-after-free issue easily[1][2], when
> doing I/O and removing&resetting NVMe device at the sametime.
>
> This patch fixes several issues reported by Yi Zhang.
>
> [1]. oops log 1
> [ 581.789754] ------------[ cut here ]------------
> [ 581.789758] kernel BUG at block/blk-mq.c:374!
> [ 581.789760] invalid opcode: 0000 [#1] SMP
> [ 581.789761] Modules linked in: vfat fat ipmi_ssif intel_rapl sb_edac
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvme
> irqbypass crct10dif_pclmul nvme_core crc32_pclmul ghash_clmulni_intel
> intel_cstate ipmi_si mei_me ipmi_devintf intel_uncore sg ipmi_msghandler
> intel_rapl_perf iTCO_wdt mei iTCO_vendor_support mxm_wmi lpc_ich dcdbas shpchp
> pcspkr acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd dm_multipath grace
> sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci
> crc32c_intel tg3 libata megaraid_sas i2c_core ptp fjes pps_core dm_mirror
> dm_region_hash dm_log dm_mod
> [ 581.789796] CPU: 1 PID: 1617 Comm: kworker/1:1H Not tainted 4.10.0.bz1420297+ #4
> [ 581.789797] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
> [ 581.789804] Workqueue: kblockd blk_mq_timeout_work
> [ 581.789806] task: ffff8804721c8000 task.stack: ffffc90006ee4000
> [ 581.789809] RIP: 0010:blk_mq_end_request+0x58/0x70
> [ 581.789810] RSP: 0018:ffffc90006ee7d50 EFLAGS: 00010202
> [ 581.789811] RAX: 0000000000000001 RBX: ffff8802e4195340 RCX: ffff88028e2f4b88
> [ 581.789812] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
> [ 581.789813] RBP: ffffc90006ee7d60 R08: 0000000000000003 R09: ffff88028e2f4b00
> [ 581.789814] R10: 0000000000001000 R11: 0000000000000001 R12: 00000000fffffffb
> [ 581.789815] R13: ffff88042abe5780 R14: 000000000000002d R15: ffff88046fbdff80
> [ 581.789817] FS: 0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
> [ 581.789818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 581.789819] CR2: 00007f64f403a008 CR3: 000000014d078000 CR4: 00000000001406e0
> [ 581.789820] Call Trace:
> [ 581.789825] blk_mq_check_expired+0x76/0x80
> [ 581.789828] bt_iter+0x45/0x50
> [ 581.789830] blk_mq_queue_tag_busy_iter+0xdd/0x1f0
> [ 581.789832] ? blk_mq_rq_timed_out+0x70/0x70
> [ 581.789833] ? blk_mq_rq_timed_out+0x70/0x70
> [ 581.789840] ? __switch_to+0x140/0x450
> [ 581.789841] blk_mq_timeout_work+0x88/0x170
> [ 581.789845] process_one_work+0x165/0x410
> [ 581.789847] worker_thread+0x137/0x4c0
> [ 581.789851] kthread+0x101/0x140
> [ 581.789853] ? rescuer_thread+0x3b0/0x3b0
> [ 581.789855] ? kthread_park+0x90/0x90
> [ 581.789860] ret_from_fork+0x2c/0x40
> [ 581.789861] Code: 48 85 c0 74 0d 44 89 e6 48 89 df ff d0 5b 41 5c 5d c3 48
> 8b bb 70 01 00 00 48 85 ff 75 0f 48 89 df e8 7d f0 ff ff 5b 41 5c 5d c3 <0f>
> 0b e8 71 f0 ff ff 90 eb e9 0f 1f 40 00 66 2e 0f 1f 84 00 00
> [ 581.789882] RIP: blk_mq_end_request+0x58/0x70 RSP: ffffc90006ee7d50
> [ 581.789889] ---[ end trace bcaf03d9a14a0a70 ]---
>
> [2]. oops log2
> [ 6984.857362] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> [ 6984.857372] IP: nvme_queue_rq+0x6e6/0x8cd [nvme]
> [ 6984.857373] PGD 0
> [ 6984.857374]
> [ 6984.857376] Oops: 0000 [#1] SMP
> [ 6984.857379] Modules linked in: ipmi_ssif vfat fat intel_rapl sb_edac
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si iTCO_wdt
> iTCO_vendor_support mxm_wmi ipmi_devintf intel_cstate sg dcdbas intel_uncore
> mei_me intel_rapl_perf mei pcspkr lpc_ich ipmi_msghandler shpchp
> acpi_power_meter wmi nfsd auth_rpcgss dm_multipath nfs_acl lockd grace sunrpc
> ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea
> sysfillrect crc32c_intel sysimgblt fb_sys_fops ttm nvme drm nvme_core ahci
> libahci i2c_core tg3 libata ptp megaraid_sas pps_core fjes dm_mirror
> dm_region_hash dm_log dm_mod
> [ 6984.857416] CPU: 7 PID: 1635 Comm: kworker/7:1H Not tainted
> 4.10.0-2.el7.bz1420297.x86_64 #1
> [ 6984.857417] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
> [ 6984.857427] Workqueue: kblockd blk_mq_run_work_fn
> [ 6984.857429] task: ffff880476e3da00 task.stack: ffffc90002e90000
> [ 6984.857432] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme]
> [ 6984.857433] RSP: 0018:ffffc90002e93c50 EFLAGS: 00010246
> [ 6984.857434] RAX: 0000000000000000 RBX: ffff880275646600 RCX: 0000000000001000
> [ 6984.857435] RDX: 0000000000000fff RSI: 00000002fba2a000 RDI: ffff8804734e6950
> [ 6984.857436] RBP: ffffc90002e93d30 R08: 0000000000002000 R09: 0000000000001000
> [ 6984.857437] R10: 0000000000001000 R11: 0000000000000000 R12: ffff8804741d8000
> [ 6984.857438] R13: 0000000000000040 R14: ffff880475649f80 R15: ffff8804734e6780
> [ 6984.857439] FS: 0000000000000000(0000) GS:ffff88047fcc0000(0000) knlGS:0000000000000000
> [ 6984.857440] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6984.857442] CR2: 0000000000000010 CR3: 0000000001c09000 CR4: 00000000001406e0
> [ 6984.857443] Call Trace:
> [ 6984.857451] ? mempool_free+0x2b/0x80
> [ 6984.857455] ? bio_free+0x4e/0x60
> [ 6984.857459] blk_mq_dispatch_rq_list+0xf5/0x230
> [ 6984.857462] blk_mq_process_rq_list+0x133/0x170
> [ 6984.857465] __blk_mq_run_hw_queue+0x8c/0xa0
> [ 6984.857467] blk_mq_run_work_fn+0x12/0x20
> [ 6984.857473] process_one_work+0x165/0x410
> [ 6984.857475] worker_thread+0x137/0x4c0
> [ 6984.857478] kthread+0x101/0x140
> [ 6984.857480] ? rescuer_thread+0x3b0/0x3b0
> [ 6984.857481] ? kthread_park+0x90/0x90
> [ 6984.857489] ret_from_fork+0x2c/0x40
> [ 6984.857490] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44
> 89 95 60 ff ff ff e8 b7 dd 12 e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c>
> 8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff
> [ 6984.857511] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: ffffc90002e93c50
> [ 6984.857512] CR2: 0000000000000010
> [ 6984.895359] ---[ end trace 2d7ceb528432bf83 ]---
>
> Cc: stable@xxxxxxxxxxxxxxx
> Reported-by: Yi Zhang <yizhan@xxxxxxxxxx>
> Tested-by: Yi Zhang <yizhan@xxxxxxxxxx>
> Reviewed-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Signed-off-by: Ming Lei <tom.leiming@xxxxxxxxx>
> ---
> block/blk-mq.c | 11 +----------
> 1 file changed, 1 insertion(+), 10 deletions(-)
Reviewed-by: Hannes Reinecke <hare@xxxxxxxx>

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@xxxxxxx +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)