Re: Kernel null pointer dereference on stopping raid device

From: Ayush Jain
Date: Wed Jun 14 2023 - 03:11:00 EST


Hello,

On 6/14/2023 1:42 AM, Jain, Ayush wrote:
Hello All,

On next-20230613 release after creation of raid devices while stopping
the same hitting kernel NULL pointer dereference situation on
AMD x86 systems.

Kernel: 6.4.0-rc6-next-20230613
Commit: 1f6ce8392d6ff48

 $ mdadm --create --assume-clean /dev/md/mdsraid --level=0 --raid-devices=1 /dev/loop0 --metadata=1.2 --verbose --force
 $ mdadm --stop /dev/md/mdsraid


Attaching Kernel trace below
[   32.260763] PEFILE: Unsigned PE binary
[  117.236671] block device autoloading is deprecated and will be removed.
[  117.262329] md127: detected capacity change from 0 to 25581568
[  180.249007] md127: detected capacity change from 25581568 to 0
[  180.255540] md: md127 stopped.
[  180.268433] BUG: kernel NULL pointer dereference, address: 00000000000000a4
[  180.276210] #PF: supervisor read access in kernel mode
[  180.281947] #PF: error_code(0x0000) - not-present page
[  180.287676] PGD 0 P4D 0
[  180.290508] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  180.295374] CPU: 5 PID: 7674 Comm: mdadm Kdump: loaded Not tainted 6.4.0-rc6-next-20230613 #1
[  180.315092] RIP: 0010:export_rdev+0xb2/0x1f0
[  180.319869] Code: c7 43 40 00 00 00 00 48 8d bb 48 01 00 00 e8 c5 c0 c5 ff 48 8b 83 b8 00 00 00 a8 10 74 0c 48 8b 43 30 8b 78 34 e8 ae fe ff ff <83> bd a4 00 00 00 fe 48 c7 c6 c0 f9 aa 9d 48 8b 7b 30 48 0f 45 f3
[  180.340820] RSP: 0018:ffffb1dadc677da0 EFLAGS: 00010246
[  180.346655] RAX: 0000000000000002 RBX: ffff9ca944130e00 RCX: 0000000080080007
[  180.354622] RDX: 0000000080080008 RSI: fffffc7fc20f2c00 RDI: 0000000000000000
[  180.362588] RBP: 0000000000000000 R08: ffff9d0943cb0000 R09: 0000000080080007
[  180.370553] R10: 0000000040000000 R11: 0000000000000001 R12: 0000000000000000
[  180.378512] R13: 0000000000000000 R14: ffff9d0943cb21d8 R15: ffff9ca94307c400
[  180.386470] FS:  00007f2a63448740(0000) GS:ffff9ca8fef40000(0000) knlGS:0000000000000000
[  180.395502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  180.401917] CR2: 00000000000000a4 CR3: 0000000102fcc000 CR4: 00000000003506e0
[  180.409875] Call Trace:
[  180.412608]  <TASK>
[  180.414957]  ? __die+0x24/0x70
[  180.418372]  ? page_fault_oops+0x82/0x150
[  180.422852]  ? exc_page_fault+0x69/0x150
[  180.427237]  ? asm_exc_page_fault+0x26/0x30
[  180.431916]  ? export_rdev+0xb2/0x1f0
[  180.436005]  ? md_kick_rdev_from_array+0x118/0x150
[  180.441354]  do_md_stop+0x28e/0x580
[  180.445241]  ? security_capable+0x3a/0x60
[  180.449721]  md_ioctl+0x540/0x940
[  180.453423]  ? selinux_bprm_creds_for_exec+0x291/0x2a0
[  180.459163]  blkdev_ioctl+0x142/0x280
[  180.463255]  __x64_sys_ioctl+0x91/0xd0
[  180.467447]  do_syscall_64+0x3f/0x90
[  180.471440]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  180.477081] RIP: 0033:0x7f2a6323ec6b
[  180.481073] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
[  180.502032] RSP: 002b:00007ffc29d52238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  180.510484] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f2a6323ec6b
[  180.518449] RDX: 0000000000000000 RSI: 0000000000000932 RDI: 0000000000000003
[  180.526415] RBP: 0000000000000003 R08: 0000000000000207 R09: 00007ffc29d51eb5
[  180.534373] R10: 000000000000007f R11: 0000000000000246 R12: 0000555c79876280
[  180.542338] R13: 00007ffc29d55379 R14: 00007ffc29d52330 R15: 00007ffc29d523d0
[  180.550305]  </TASK>


After reverting commit: 2736e8eeb0ccdc71d1f4256c9c9a28f58cc43307

Author: Christoph Hellwig <hch@xxxxxx>
Date: Thu Jun 8 13:02:43 2023 +0200

block: use the holder as indication for exclusive opens

Able to see problem resolved.

Can you please look over the issue Christoph.

Thanks & Regards,
Ayush Jain

Thanks & Regards,
Ayush Jain