Re: [PATCH v2] blk-mq: release scheduler resource when request complete

From: kernel test robot
Date: Thu Aug 17 2023 - 10:44:34 EST




Hello,

kernel test robot noticed "WARNING:at_block/mq-deadline.c:#dd_exit_sched" on:

commit: 567b5a4f2f8b9e9fe97d5f8dca266d95c734ae91 ("[PATCH v2] blk-mq: release scheduler resource when request complete")
url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/blk-mq-release-scheduler-resource-when-request-complete/20230813-232513
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/all/20230813152325.3017343-1-chengming.zhou@xxxxxxxxx/
patch subject: [PATCH v2] blk-mq: release scheduler resource when request complete

this also caused xfstests.generic.704.fail

in testcase: xfstests
version: xfstests-x86_64-bb8af9c-1_20230807
with following parameters:

disk: 4HDD
fs: ext4
test: generic-704



compiler: gcc-12
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202308172100.8ce4b853-oliver.sang@xxxxxxxxx


[ 222.617396][ T2216] ------------[ cut here ]------------
[ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
[ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.638218][ T2216] Modules linked in: scsi_debug(-) dm_mod btrfs blake2b_generic xor intel_rapl_msr raid6_pq intel_rapl_common zstd_compress libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel sd_mod t10_pi kvm crc64_rocksoft_generic crc64_rocksoft crc64 irqbypass sg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 ipmi_devintf ipmi_msghandler mei_wdt i915 rapl drm_buddy intel_cstate intel_gtt wmi_bmof drm_display_helper intel_uncore drm_kms_helper ahci libahci mei_me libata intel_pch_thermal ttm mei video wmi intel_pmc_core acpi_pad drm fuse ip_tables
[ 222.690679][ T2216] CPU: 0 PID: 2216 Comm: modprobe Tainted: G I 6.5.0-rc5-00190-g567b5a4f2f8b #1
[ 222.700878][ T2216] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015
[ 222.708919][ T2216] RIP: 0010:dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.714130][ T2216] Code: 89 da 8b 4b 04 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e c4 00 00 00 8b 13 44 89 e6 48 c7 c7 20 d3 e2 83 e8 e2 d1 08 ff <0f> 0b e9 f4 fe ff ff 0f 0b e9 78 fe ff ff 48 89 ee 48 c7 c7 20 cb
All code
========
0: 89 da mov %ebx,%edx
2: 8b 4b 04 mov 0x4(%rbx),%ecx
5: 48 c1 ea 03 shr $0x3,%rdx
9: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax
d: 84 c0 test %al,%al
f: 74 08 je 0x19
11: 3c 03 cmp $0x3,%al
13: 0f 8e c4 00 00 00 jle 0xdd
19: 8b 13 mov (%rbx),%edx
1b: 44 89 e6 mov %r12d,%esi
1e: 48 c7 c7 20 d3 e2 83 mov $0xffffffff83e2d320,%rdi
25: e8 e2 d1 08 ff callq 0xffffffffff08d20c
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 f4 fe ff ff jmpq 0xffffffffffffff25
31: 0f 0b ud2
33: e9 78 fe ff ff jmpq 0xfffffffffffffeb0
38: 48 89 ee mov %rbp,%rsi
3b: 48 rex.W
3c: c7 .byte 0xc7
3d: c7 (bad)
3e: 20 cb and %cl,%bl

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 f4 fe ff ff jmpq 0xfffffffffffffefb
7: 0f 0b ud2
9: e9 78 fe ff ff jmpq 0xfffffffffffffe86
e: 48 89 ee mov %rbp,%rsi
11: 48 rex.W
12: c7 .byte 0xc7
13: c7 (bad)
14: 20 cb and %cl,%bl
[ 222.733468][ T2216] RSP: 0018:ffffc9000092f858 EFLAGS: 00010282
[ 222.739367][ T2216] RAX: 0000000000000000 RBX: ffff88819e33e0b0 RCX: ffffffff821fb87e
[ 222.747166][ T2216] RDX: 1ffff110fe1c57b8 RSI: 0000000000000008 RDI: ffffc9000092f610
[ 222.754975][ T2216] RBP: 0000000000000116 R08: 0000000000000001 R09: fffff52000125ec2
[ 222.762782][ T2216] R10: ffffc9000092f617 R11: 0000000000000001 R12: 0000000000000001
[ 222.770591][ T2216] R13: dffffc0000000000 R14: ffff88819e33e000 R15: ffff88819e33e0bc
[ 222.778386][ T2216] FS: 00007fd9aea5e540(0000) GS:ffff8887f0e00000(0000) knlGS:0000000000000000
[ 222.787144][ T2216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 222.793561][ T2216] CR2: 00007fbe00432bb0 CR3: 0000000126856005 CR4: 00000000003706f0
[ 222.801359][ T2216] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 222.809156][ T2216] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 222.816970][ T2216] Call Trace:
[ 222.820093][ T2216] <TASK>
[ 222.822872][ T2216] ? __warn (kernel/panic.c:673)
[ 222.826794][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.831403][ T2216] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 222.835752][ T2216] ? handle_bug (arch/x86/kernel/traps.c:324)
[ 222.839912][ T2216] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
[ 222.844433][ T2216] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 222.849286][ T2216] ? llist_add_batch (lib/llist.c:33 (discriminator 14))
[ 222.853980][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.858586][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
[ 222.863193][ T2216] blk_mq_exit_sched (block/blk-mq-sched.c:550)
[ 222.867973][ T2216] ? blk_mq_sched_free_rqs (block/blk-mq-sched.c:527)
[ 222.873257][ T2216] ? __mutex_lock_slowpath (kernel/locking/mutex.c:282)
[ 222.878384][ T2216] ? ioc_clear_queue (include/linux/list.h:292 block/blk-ioc.c:174)
[ 222.883076][ T2216] ? blk_mq_poll (block/blk-mq.c:4832)
[ 222.887324][ T2216] elevator_exit (block/elevator.c:168)
[ 222.891572][ T2216] del_gendisk (block/genhd.c:701)
[ 222.895834][ T2216] ? diskstats_show (block/genhd.c:631)
[ 222.900526][ T2216] ? __pm_runtime_resume (drivers/base/power/runtime.c:1174)
[ 222.905568][ T2216] sd_remove (drivers/scsi/sd.c:3736) sd_mod
[ 222.910436][ T2216] device_release_driver_internal (drivers/base/dd.c:1272 drivers/base/dd.c:1293)
[ 222.916335][ T2216] bus_remove_device (include/linux/kobject.h:191 drivers/base/base.h:73 drivers/base/bus.c:581)
[ 222.921225][ T2216] device_del (drivers/base/core.c:3815)
[ 222.925388][ T2216] ? attribute_container_device_trigger (drivers/base/attribute_container.c:357)
[ 222.931807][ T2216] ? __device_link_del (drivers/base/core.c:3769)
[ 222.936760][ T2216] ? attribute_container_device_trigger_safe (drivers/base/attribute_container.c:357)
[ 222.943612][ T2216] __scsi_remove_device (drivers/scsi/scsi_sysfs.c:1489)
[ 222.948651][ T2216] scsi_forget_host (drivers/scsi/scsi_scan.c:1988)
[ 222.953243][ T2216] scsi_remove_host (drivers/scsi/hosts.c:182)
[ 222.957838][ T2216] sdebug_driver_remove (drivers/scsi/scsi_debug.c:7841) scsi_debug
[ 222.963933][ T2216] ? kernfs_remove_by_name_ns (fs/kernfs/dir.c:1679)
[ 222.969492][ T2216] device_release_driver_internal (drivers/base/dd.c:1272 drivers/base/dd.c:1293)
[ 222.975392][ T2216] bus_remove_device (include/linux/kobject.h:191 drivers/base/base.h:73 drivers/base/bus.c:581)
[ 222.980173][ T2216] device_del (drivers/base/core.c:3815)
[ 222.984349][ T2216] ? __device_link_del (drivers/base/core.c:3769)
[ 222.989299][ T2216] ? mutex_unlock (arch/x86/include/asm/atomic64_64.h:109 include/linux/atomic/atomic-arch-fallback.h:4303 include/linux/atomic/atomic-long.h:1499 include/linux/atomic/atomic-instrumented.h:4446 kernel/locking/mutex.c:181 kernel/locking/mutex.c:540)
[ 222.993645][ T2216] ? __mutex_unlock_slowpath+0x2b0/0x2b0
[ 223.000160][ T2216] device_unregister (drivers/base/core.c:3732 drivers/base/core.c:3845)
[ 223.004772][ T2216] sdebug_do_remove_host (drivers/scsi/scsi_debug.c:7299) scsi_debug
[ 223.011039][ T2216] scsi_debug_exit (drivers/scsi/scsi_debug.c:7670) scsi_debug
[ 223.016625][ T2216] __do_sys_delete_module+0x316/0x540
[ 223.022880][ T2216] ? module_flags (kernel/module/main.c:698)
[ 223.027401][ T2216] ? task_work_cancel (kernel/task_work.c:147)
[ 223.032081][ T2216] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:115 include/linux/atomic/atomic-arch-fallback.h:2155 include/linux/atomic/atomic-instrumented.h:1296 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 223.036590][ T2216] ? exit_to_user_mode_loop (include/linux/sched.h:2337 include/linux/resume_user_mode.h:61 kernel/entry/common.c:171)
[ 223.041890][ T2216] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 223.046154][ T2216] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 223.051867][ T2216] RIP: 0033:0x7fd9aeb83417
[ 223.056129][ T2216] Code: 73 01 c3 48 8b 0d 79 1a 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 1a 0d 00 f7 d8 64 89 01 48
All code
========
0: 73 01 jae 0x3
2: c3 retq
3: 48 8b 0d 79 1a 0d 00 mov 0xd1a79(%rip),%rcx # 0xd1a83
a: f7 d8 neg %eax
c: 64 89 01 mov %eax,%fs:(%rcx)
f: 48 83 c8 ff or $0xffffffffffffffff,%rax
13: c3 retq
14: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1b: 00 00 00
1e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
23: b8 b0 00 00 00 mov $0xb0,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 49 1a 0d 00 mov 0xd1a49(%rip),%rcx # 0xd1a83
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 49 1a 0d 00 mov 0xd1a49(%rip),%rcx # 0xd1a59
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230817/202308172100.8ce4b853-oliver.sang@xxxxxxxxx



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki