Re: [BUG?] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 3 jiffies s: 309 root: 0x1/.

From: Rui Salvaterra
Date: Thu Apr 13 2023 - 03:30:20 EST


Hi again, everyone.

So, while preparing to file the bug report with the requested
information, I got a trace completely unrelated to DRM (on a swapon
call, it seems).

[ 4.868340] rcu: INFO: rcu_sched detected expedited stalls on
CPUs/tasks: { 4-.... } 3 jiffies s: 265 root: 0x10/.
[ 4.868349] rcu: blocking rcu_node structures (internal RCU debug):
[ 4.868351] Sending NMI from CPU 3 to CPUs 4:
[ 4.868355] NMI backtrace for cpu 4
[ 4.868357] CPU: 4 PID: 462 Comm: swapon Not tainted 6.3.0-rc6-debug+ #57
[ 4.868359] Hardware name: Apple Inc.
Macmini6,2/Mac-F65AE981FFA204ED, BIOS 429.0.0.0.0 03/18/2022
[ 4.868360] RIP: 0010:zram_submit_bio+0x57c/0x940
[ 4.868365] Code: 04 4c 01 f0 48 8d 48 08 f0 48 0f ba 68 08 0d 0f
82 80 00 00 00 4c 89 ef e8 01 eb ff ff 49 8b 45 00 4a 8d 44 30 09 f0
80 20 df <f0> 48 ff 45 00 48 81 eb 00 10 00 00 41 83 c4 01 48 81 fb ff
0f 00
[ 4.868366] RSP: 0018:ffff8881057dbcd8 EFLAGS: 00000246
[ 4.868368] RAX: ffffc90001c186d9 RBX: 000000003e893000 RCX: ffffc90001c186d8
[ 4.868369] RDX: ffffc90001c186d0 RSI: 0000000000000000 RDI: ffff88810083b400
[ 4.868369] RBP: ffff88810083b470 R08: 0000000000027e40 R09: 0000000000025850
[ 4.868370] R10: 000000000014b212 R11: ffff88810ba03180 R12: 00000000000c176d
[ 4.868371] R13: ffff88810083b400 R14: 0000000000c176d0 R15: 0000000000000000
[ 4.868372] FS: 00007fbd8f8ce800(0000) GS:ffff888266100000(0000)
knlGS:0000000000000000
[ 4.868373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.868374] CR2: 0000563005371000 CR3: 000000010355c003 CR4: 00000000001706e0
[ 4.868375] Call Trace:
[ 4.868377] <TASK>
[ 4.868378] ? block_read_full_folio+0x23e/0x2e0
[ 4.868383] ? kmem_cache_alloc+0x1b/0x110
[ 4.868385] ? mempool_alloc+0x37/0x140
[ 4.868388] ? pcpu_block_update_hint_alloc+0xce/0x2f0
[ 4.868390] __submit_bio+0x41/0xd0
[ 4.868394] submit_bio_noacct_nocheck+0xc4/0x2b0
[ 4.868396] blk_next_bio+0x55/0x70
[ 4.868398] __blkdev_issue_discard+0xc8/0x180
[ 4.868401] blkdev_issue_discard+0x3c/0x80
[ 4.868403] __x64_sys_swapon+0xb71/0x1120
[ 4.868407] do_syscall_64+0x2b/0x50
[ 4.868410] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 4.868414] RIP: 0033:0x7fbd8f712d5b
[ 4.868416] Code: 73 01 c3 48 8b 0d bd 30 0e 00 f7 d8 64 89 01 48
83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 a7 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 30 0e 00 f7 d8 64 89
01 48
[ 4.868417] RSP: 002b:00007ffcaf9a3448 EFLAGS: 00000246 ORIG_RAX:
00000000000000a7
[ 4.868418] RAX: ffffffffffffffda RBX: 0000000000018064 RCX: 00007fbd8f712d5b
[ 4.868419] RDX: 0000000000018064 RSI: 0000000000018064 RDI: 000056300535fb10
[ 4.868420] RBP: 00007ffcaf9a3530 R08: 000000014b213000 R09: 00007fbd8f7f70f0
[ 4.868420] R10: 0000000000001000 R11: 0000000000000246 R12: 000056300535fb10
[ 4.868421] R13: 0000000000000064 R14: 00007ffcaf9a3530 R15: 0000000000000000
[ 4.868423] </TASK>

Could it be that RCU is reporting expedited stalls too eagerly? And,
if so, why only on this machine?