6.6.8 stable: crash in folio_mark_dirty

From: Genes Lists
Date: Sat Dec 30 2023 - 10:33:12 EST



Apologies in advance, but I cannot git bisect this since machine was
running for 10 days on 6.6.8 before this happened.

Reporting in case it's useful (and not a hardware fail).

There is nothing interesting in journal ahead of the crash - previous
entry, 2 minutes prior from user space dhcp server.

- Root, efi is on nvme
- Spare root,efi is on sdg
- md raid6 on sda-sd with lvmcache from one partition on nvme drive.
- all filesystems are ext4 (other than efi).
- 32 GB mem.


regards

gene

details attached which show:

Dec 30 07:00:36 s6 kernel: <TASK>
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel: ? __warn+0x81/0x130
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel: ? report_bug+0x171/0x1a0
Dec 30 07:00:36 s6 kernel: ? handle_bug+0x3c/0x80
Dec 30 07:00:36 s6 kernel: ? exc_invalid_op+0x17/0x70
Dec 30 07:00:36 s6 kernel: ? asm_exc_invalid_op+0x1a/0x20
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel: block_dirty_folio+0x8a/0xb0
Dec 30 07:00:36 s6 kernel: unmap_page_range+0xd17/0x1120
Dec 30 07:00:36 s6 kernel: unmap_vmas+0xb5/0x190
Dec 30 07:00:36 s6 kernel: exit_mmap+0xec/0x340
Dec 30 07:00:36 s6 kernel: __mmput+0x3e/0x130
Dec 30 07:00:36 s6 kernel: do_exit+0x31c/0xb20
Dec 30 07:00:36 s6 kernel: do_group_exit+0x31/0x80
Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group+0x18/0x20
Dec 30 07:00:36 s6 kernel: do_syscall_64+0x5d/0x90
Dec 30 07:00:36 s6 kernel: ? count_memcg_events.constprop.0+0x1a/0x30
Dec 30 07:00:36 s6 kernel: ? handle_mm_fault+0xa2/0x360
Dec 30 07:00:36 s6 kernel: ? do_user_addr_fault+0x30f/0x660
Dec 30 07:00:36 s6 kernel: ? exc_page_fault+0x7f/0x180
Dec 30 07:00:36 s6 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Dec 30 07:00:36 s6 kernel: RIP: 0033:0x7fb3c581ee2d
Dec 30 07:00:36 s6 kernel: Code: Unable to access opcode bytes at
0x7fb3c581ee03.
Dec 30 07:00:36 s6 kernel: RSP: 002b:00007fff620541e8 EFLAGS: 00000206
ORIG_RAX: 00000000000000e7
Dec 30 07:00:36 s6 kernel: RAX: ffffffffffffffda RBX: 00007fb3c591efa8
RCX: 00007fb3c581ee2d
Dec 30 07:00:36 s6 kernel: RDX: 00000000000000e7 RSI: ffffffffffffff88
RDI: 0000000000000000
Dec 30 07:00:36 s6 kernel: RBP: 0000000000000002 R08: 0000000000000000
R09: 00007fb3c5924920
Dec 30 07:00:36 s6 kernel: R10: 00005650f2e615f0 R11: 0000000000000206
R12: 0000000000000000
Dec 30 07:00:36 s6 kernel: R13: 0000000000000000 R14: 00007fb3c591d680
R15: 00007fb3c591efc0
Dec 30 07:00:36 s6 kernel: </TASK>

Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?)
Dec 30 07:00:36 s6 kernel: Modules linked in: algif_hash af_alg rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs nft_nat nft_chain_nat nf_nat nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rpcrdma rdma>
Dec 30 07:00:36 s6 kernel: async_xor rapl joydev async_tx intel_cstate mei_me nls_iso8859_1 vfat i2c_i801 xor cec snd raid6_pq libcrc32c intel_uncore mxm_wmi pcspkr e1000e i2c_smbus intel_wmi_thunderbolt soundcore mei>
Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
Dec 30 07:00:36 s6 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Extreme4, BIOS P4.20 10/31/2019
Dec 30 07:00:36 s6 kernel: RIP: 0010:__folio_mark_dirty (??:?)
Dec 30 07:00:36 s6 kernel: Code: 89 fe e8 57 22 14 00 65 ff 0d b8 ff f2 62 0f 84 8d 00 00 00 49 8b 3c 24 e9 47 fe ff ff 4c 89 ff e8 b9 18 08 00 48 89 c6 eb 85 <0f> 0b e9 27 fe ff ff 48 8b 52 10 e9 56 ff ff ff 48 c7 04 >
All code
========
0: 89 fe mov %edi,%esi
2: e8 57 22 14 00 call 0x14225e
7: 65 ff 0d b8 ff f2 62 decl %gs:0x62f2ffb8(%rip) # 0x62f2ffc6
e: 0f 84 8d 00 00 00 je 0xa1
14: 49 8b 3c 24 mov (%r12),%rdi
18: e9 47 fe ff ff jmp 0xfffffffffffffe64
1d: 4c 89 ff mov %r15,%rdi
20: e8 b9 18 08 00 call 0x818de
25: 48 89 c6 mov %rax,%rsi
28: eb 85 jmp 0xffffffffffffffaf
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 27 fe ff ff jmp 0xfffffffffffffe58
31: 48 8b 52 10 mov 0x10(%rdx),%rdx
35: e9 56 ff ff ff jmp 0xffffffffffffff90
3a: 48 rex.W
3b: c7 .byte 0xc7
3c: 04 00 add $0x0,%al

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 27 fe ff ff jmp 0xfffffffffffffe2e
7: 48 8b 52 10 mov 0x10(%rdx),%rdx
b: e9 56 ff ff ff jmp 0xffffffffffffff66
10: 48 rex.W
11: c7 .byte 0xc7
12: 04 00 add $0x0,%al
Dec 30 07:00:36 s6 kernel: RSP: 0018:ffffc9000c037b00 EFLAGS: 00010046
Dec 30 07:00:36 s6 kernel: RAX: 02ffff6000008030 RBX: 0000000000000286 RCX: ffff8885d44dff08
Dec 30 07:00:36 s6 kernel: RDX: 0000000000000001 RSI: ffff88810d015ca8 RDI: ffff88810d015cb0
Dec 30 07:00:36 s6 kernel: RBP: ffff88810d015cb0 R08: ffff8885208c1300 R09: 0000000000000000
Dec 30 07:00:36 s6 kernel: R10: 0000000000000200 R11: 0000000000000002 R12: ffff88810d015ca8
Dec 30 07:00:36 s6 kernel: R13: 0000000000000001 R14: ffff88851ec72fc0 R15: ffffea00105c5e00
Dec 30 07:00:36 s6 kernel: FS: 0000000000000000(0000) GS:ffff88889ee00000(0000) knlGS:0000000000000000
Dec 30 07:00:36 s6 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 30 07:00:36 s6 kernel: CR2: 00007fb3c593b020 CR3: 0000000690e20003 CR4: 00000000003706f0
Dec 30 07:00:36 s6 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 30 07:00:36 s6 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Dec 30 07:00:36 s6 kernel: Call Trace:
Dec 30 07:00:36 s6 kernel: <TASK>
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty (??:?)
Dec 30 07:00:36 s6 kernel: ? __warn (??:?)
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty (??:?)
Dec 30 07:00:36 s6 kernel: ? report_bug (??:?)
Dec 30 07:00:36 s6 kernel: ? handle_bug (??:?)
Dec 30 07:00:36 s6 kernel: ? exc_invalid_op (??:?)
Dec 30 07:00:36 s6 kernel: ? asm_exc_invalid_op (??:?)
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty (??:?)
Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?)
Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?)
Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?)
Dec 30 07:00:36 s6 kernel: exit_mmap (??:?)
Dec 30 07:00:36 s6 kernel: __mmput (??:?)
Dec 30 07:00:36 s6 kernel: do_exit (??:?)
Dec 30 07:00:36 s6 kernel: do_group_exit (??:?)
Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?)
Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?)
Dec 30 07:00:36 s6 kernel: ? count_memcg_events.constprop.0 (??:?)
Dec 30 07:00:36 s6 kernel: ? handle_mm_fault (??:?)
Dec 30 07:00:36 s6 kernel: ? do_user_addr_fault (??:?)
Dec 30 07:00:36 s6 kernel: ? exc_page_fault (??:?)
Dec 30 07:00:36 s6 kernel: entry_SYSCALL_64_after_hwframe (??:?)
Dec 30 07:00:36 s6 kernel: RIP: 0033:0x7fb3c581ee2d
Dec 30 07:00:36 s6 kernel: Code: Unable to access opcode bytes at 0x7fb3c581ee03.

Code starting with the faulting instruction
===========================================
Dec 30 07:00:36 s6 kernel: RSP: 002b:00007fff620541e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Dec 30 07:00:36 s6 kernel: RAX: ffffffffffffffda RBX: 00007fb3c591efa8 RCX: 00007fb3c581ee2d
Dec 30 07:00:36 s6 kernel: RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
Dec 30 07:00:36 s6 kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fb3c5924920
Dec 30 07:00:36 s6 kernel: R10: 00005650f2e615f0 R11: 0000000000000206 R12: 0000000000000000
Dec 30 07:00:36 s6 kernel: R13: 0000000000000000 R14: 00007fb3c591d680 R15: 00007fb3c591efc0
Dec 30 07:00:36 s6 kernel: </TASK>
Dec 30 07:00:36 s6 kernel: ---[ end trace 0000000000000000 ]---
Dec 30 07:00:36 s6 kernel: BUG: Bad rss-counter state mm:000000008e24d57a type:MM_FILEPAGES val:-1
Dec 30 07:00:36 s6 kernel: BUG: Bad rss-counter state mm:000000008e24d57a type:MM_ANONPAGES val:1
Dec 30 07:02:23 s6 kernel: general protection fault, probably for non-canonical address 0x6d65532d66697975: 0000 [#1] PREEMPT SMP PTI
Dec 30 07:02:23 s6 kernel: CPU: 7 PID: 521578 Comm: rsync Tainted: G W 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
Dec 30 07:02:23 s6 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Extreme4, BIOS P4.20 10/31/2019
Dec 30 07:02:23 s6 kernel: RIP: 0010:__mod_memcg_lruvec_state (??:?)
Dec 30 07:02:23 s6 kernel: Code: ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 8f 40 0b 00 00 48 63 c2 89 f6 48 c1 e6 03 <48> 8b 91 10 07 00 00 48 01 f2 65 48 01 02 48 03 b7 28 06 >
All code
========
0: ff 90 90 90 90 90 call *-0x6f6f6f70(%rax)
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 90 nop
11: 66 0f 1f 00 nopw (%rax)
15: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
1a: 48 8b 8f 40 0b 00 00 mov 0xb40(%rdi),%rcx
21: 48 63 c2 movslq %edx,%rax
24: 89 f6 mov %esi,%esi
26: 48 c1 e6 03 shl $0x3,%rsi
2a:* 48 8b 91 10 07 00 00 mov 0x710(%rcx),%rdx <-- trapping instruction
31: 48 01 f2 add %rsi,%rdx
34: 65 48 01 02 add %rax,%gs:(%rdx)
38: 48 rex.W
39: 03 .byte 0x3
3a: b7 28 mov $0x28,%bh
3c: 06 (bad)
...

Code starting with the faulting instruction
===========================================
0: 48 8b 91 10 07 00 00 mov 0x710(%rcx),%rdx
7: 48 01 f2 add %rsi,%rdx
a: 65 48 01 02 add %rax,%gs:(%rdx)
e: 48 rex.W
f: 03 .byte 0x3
10: b7 28 mov $0x28,%bh
12: 06 (bad)
...
Dec 30 07:02:23 s6 kernel: RSP: 0018:ffffc9000c12fb68 EFLAGS: 00010206