Re: [syzbot] [mm?] kernel BUG in validate_mm (3)

From: Yajun Deng
Date: Tue Jan 30 2024 - 06:34:50 EST



On 2024/1/29 23:22, Liam R. Howlett wrote:
Yajun,


* syzbot <syzbot+39a72b995ba73633c1a7@xxxxxxxxxxxxxxxxxxxxxxxxx> [240129 06:15]:
Hello,

syzbot found the following issue on:

HEAD commit: 596764183be8 Add linux-next specific files for 20240129
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=142042f3e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=584144ad19f381aa
dashboard link: https://syzkaller.appspot.com/bug?extid=39a72b995ba73633c1a7
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11844ba7e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15bd01efe80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/b647c038857b/disk-59676418.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/729e26c3ac55/vmlinux-59676418.xz
kernel image: https://storage.googleapis.com/syzbot-assets/15aa5e287059/bzImage-59676418.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+39a72b995ba73633c1a7@xxxxxxxxxxxxxxxxxxxxxxxxx

arg_start 7fffd9277efb arg_end 7fffd9277f14 env_start 7fffd9277f14 env_end 7fffd9277fdf
binfmt ffffffff8d9c5c00 flags 80007fd
ioctx_table 0000000000000000
owner ffff88802c0cda00 exe_file ffff88801ff60500
notifier_subscriptions 0000000000000000
numa_next_scan 0 numa_scan_offset 0 numa_scan_seq 0
tlb_flush_pending 0
def_flags: 0x0()
------------[ cut here ]------------
kernel BUG at mm/mmap.c:328!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 PID: 5058 Comm: syz-executor310 Not tainted 6.8.0-rc1-next-20240129-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:validate_mm+0x3f3/0x4b0 mm/mmap.c:328
Code: 0f 84 a4 fd ff ff e9 47 ff ff ff e8 77 91 b9 ff 44 89 f2 89 de 48 c7 c7 e0 af 19 8b e8 56 69 9b ff 4c 89 ff e8 ce c4 fa ff 90 <0f> 0b e8 56 91 b9 ff 0f b6 15 1f dd b1 0d 31 ff 89 d6 88 14 24 e8
RSP: 0018:ffffc900035df958 EFLAGS: 00010282
RAX: 000000000000032a RBX: 000000000000000d RCX: ffffffff816e2f59
RDX: 0000000000000000 RSI: ffffffff816eb7e6 RDI: 0000000000000005
RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: 00007fffd92ff000
R13: 0000000000000000 R14: 000000000000000e R15: ffff88802c6b8000
FS: 0000555557046380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f10ada208a0 CR3: 000000007b434000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vma_merge+0x16a9/0x3d70 mm/mmap.c:1033
vma_merge_new_vma mm/mmap.c:2465 [inline]
mmap_region+0x206b/0x2760 mm/mmap.c:2841
do_mmap+0x8ae/0xf10 mm/mmap.c:1380
vm_mmap_pgoff+0x1ab/0x3c0 mm/util.c:573
ksys_mmap_pgoff+0x425/0x5b0 mm/mmap.c:1426
__do_sys_mmap arch/x86/kernel/sys_x86_64.c:93 [inline]
__se_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline]
__x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:86
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd2/0x260 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
I just tested the reproducer against linux-next with and without your
patch [1], and confirmed that is the cause of this validation failure.

The validation code is seeing an extra vma in the tree compared to the
vma count:

[ 57.065418] mmap: map_count 24 vma iterator 25

There is a C reproducer from the bot. Can you please have a look to
figure out what is missing?

It shoud be like this:

                                   ******
                     PPPPP             NNNNN

But it was treated as:

                                   ******
                     PPPPPCCCCCNNNNN


I haven't found something that should be added to the check yet. I'll continue tomorrow.


It would really help if you had that complicated if statement expanded
with comments:

if (prev == curr || /* ??? */
addr != curr->vm_start.......


Thanks,
Liam

[1]. https://lore.kernel.org/linux-mm/20240125034922.1004671-3-yajun.deng@xxxxxxxxx/