Re: [syzbot] [usb?] KASAN: slab-out-of-bounds Read in read_descriptors (3)

From: Khazhy Kumykov
Date: Fri Jul 21 2023 - 14:10:47 EST


On Mon, Jun 19, 2023 at 7:56 PM syzbot
<syzbot+18996170f8096c6174d0@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 40f71e7cd3c6 Merge tag 'net-6.4-rc7' of git://git.kernel.o..
> git tree: upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=1581445b280000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ac246111fb601aec
> dashboard link: https://syzkaller.appspot.com/bug?extid=18996170f8096c6174d0
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15d23487280000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16613ed3280000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/30922ad38c58/disk-40f71e7c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/3bd12e7503b8/vmlinux-40f71e7c.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/1dcd340b18d4/bzImage-40f71e7c.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+18996170f8096c6174d0@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in read_descriptors+0x263/0x280 drivers/usb/core/sysfs.c:883
> Read of size 8 at addr ffff88801e78b8c8 by task udevd/5011
>
> CPU: 0 PID: 5011 Comm: udevd Not tainted 6.4.0-rc6-syzkaller-00195-g40f71e7cd3c6 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
> print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
> print_report mm/kasan/report.c:462 [inline]
> kasan_report+0x11c/0x130 mm/kasan/report.c:572

"src = udev->rawdescriptors[cfgno]" (so, just reading rawdescriptors)

> read_descriptors+0x263/0x280 drivers/usb/core/sysfs.c:883
> sysfs_kf_bin_read+0x19a/0x270 fs/sysfs/file.c:97
> kernfs_file_read_iter fs/kernfs/file.c:251 [inline]
> kernfs_fop_read_iter+0x387/0x690 fs/kernfs/file.c:280
> call_read_iter include/linux/fs.h:1862 [inline]
> new_sync_read fs/read_write.c:389 [inline]
> vfs_read+0x4b1/0x8a0 fs/read_write.c:470
> ksys_read+0x12b/0x250 fs/read_write.c:613
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f07c7916b6a
> Code: 00 3d 00 00 41 00 75 0d 50 48 8d 3d 2d 08 0a 00 e8 ea 7d 01 00 31 c0 e9 07 ff ff ff 64 8b 04 25 18 00 00 00 85 c0 75 1b 0f 05 <48> 3d 00 f0 ff ff 76 6c 48 8b 15 8f a2 0d 00 f7 d8 64 89 02 48 83
> RSP: 002b:00007ffdf34973d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f07c7916b6a
> RDX: 0000000000010011 RSI: 00007ffdf3497407 RDI: 0000000000000008
> RBP: 0000000000000008 R08: 0000000000000003 R09: f4f13e10193fbafe
> R10: 0000000000000000 R11: 0000000000000246 R12: 000055be37470e10
> R13: 00007ffdf34a7ae8 R14: 00007ffdf34a8138 R15: 00007ffdf3497407
> </TASK>
>
> Allocated by task 758:
> kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
> kasan_set_track+0x25/0x30 mm/kasan/common.c:52
> ____kasan_kmalloc mm/kasan/common.c:374 [inline]
> ____kasan_kmalloc mm/kasan/common.c:333 [inline]
> __kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383
> kasan_kmalloc include/linux/kasan.h:196 [inline]
> __do_kmalloc_node mm/slab_common.c:966 [inline]
> __kmalloc+0x5e/0x190 mm/slab_common.c:979
> kmalloc include/linux/slab.h:563 [inline]
> kzalloc include/linux/slab.h:680 [inline]

kzmalloc(length) -> this length derived from dev->descriptor.bNumConfigurations

The corresponding kfree is in usb_destroy_configuration (makes sense)
- we also set rawdescriptors to NULL here. If this race was happening,
I'd also expect some sort of null deref report...

Stumbled upon https://lore.kernel.org/all/1599201467-11000-1-git-send-email-prime.zeng@xxxxxxxxxxxxx/T/,
which suggests that we can, instead, race with a descriptor change,
which sounds plausible - descriptor changes, bNumConfigurations no
longer lines up with our kmalloc... so we may run past the end of it.

Looking at hub_port_connect_change(), we seem to read directly into
udev->descriptor, check if it changed, and if it did, set
udev->descriptor back to the old one...? If we have an ongoing sysfs
read, which directly touches udev->descriptor, there might be
trouble...

I see this is called in both hub_port_connect_change() and
usb_reset_and_verify_device()... which both seem to lock the port_dev?
("port_dev->status_lock"). This looks like a different lock than
usb_lock_device_interruptible would grab, maybe the code has changed
since that was reported in 2020. But it seems to suggest we want to
grab this lock in sysfs to safely read from udev->descriptor.

(I'm not clear on when the sysfs gets added/removed, since it happens
in usb_bus_notify()..., the above two functions that touch
udev->descriptor don't look like they send the
BUS_NOTIFY_ADD/DEL_DEVICE to me, so the race seems plausible)

> usb_get_configuration+0x1f7/0x5170 drivers/usb/core/config.c:887
> usb_enumerate_device drivers/usb/core/hub.c:2407 [inline]
> usb_new_device+0x12b0/0x19d0 drivers/usb/core/hub.c:2545
> hub_port_connect drivers/usb/core/hub.c:5407 [inline]
> hub_port_connect_change drivers/usb/core/hub.c:5551 [inline]
> port_event drivers/usb/core/hub.c:5711 [inline]
> hub_event+0x2d9e/0x4e40 drivers/usb/core/hub.c:5793
> process_one_work+0x99a/0x15e0 kernel/workqueue.c:2405
> worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
> kthread+0x344/0x440 kernel/kthread.c:379
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>
> The buggy address belongs to the object at ffff88801e78b8c0
> which belongs to the cache kmalloc-8 of size 8
> The buggy address is located 0 bytes to the right of
> allocated 8-byte region [ffff88801e78b8c0, ffff88801e78b8c8)
>
> The buggy address belongs to the physical page:
> page:ffffea000079e2c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1e78b
> anon flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
> page_type: 0xffffffff()
> raw: 00fff00000000200 ffff888012441280 0000000000000000 dead000000000001
> raw: 0000000000000000 0000000000660066 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, tgid 1 (swapper/0), ts 8298345549, free_ts 8292702290
> set_page_owner include/linux/page_owner.h:31 [inline]
> post_alloc_hook+0x2db/0x350 mm/page_alloc.c:1731
> prep_new_page mm/page_alloc.c:1738 [inline]
> get_page_from_freelist+0xf41/0x2c00 mm/page_alloc.c:3502
> __alloc_pages+0x1cb/0x4a0 mm/page_alloc.c:4768
> alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2112
> alloc_pages+0x233/0x270 mm/mempolicy.c:2274
> alloc_slab_page mm/slub.c:1851 [inline]
> allocate_slab+0x25f/0x390 mm/slub.c:1998
> new_slab mm/slub.c:2051 [inline]
> ___slab_alloc+0xa91/0x1400 mm/slub.c:3192
> __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3291
> __slab_alloc_node mm/slub.c:3344 [inline]
> slab_alloc_node mm/slub.c:3441 [inline]
> __kmem_cache_alloc_node+0x136/0x320 mm/slub.c:3490
> __do_kmalloc_node mm/slab_common.c:965 [inline]
> __kmalloc_node_track_caller+0x4f/0x1a0 mm/slab_common.c:986
> kstrdup+0x3f/0x70 mm/util.c:62
> kstrdup_const+0x57/0x80 mm/util.c:85
> kvasprintf_const+0x10c/0x190 lib/kasprintf.c:48
> kobject_set_name_vargs+0x5a/0x150 lib/kobject.c:267
> dev_set_name+0xbf/0xf0 drivers/base/core.c:3429
> tty_register_device_attr+0x301/0x7d0 drivers/tty/tty_io.c:3243
> page last free stack trace:
> reset_page_owner include/linux/page_owner.h:24 [inline]
> free_pages_prepare mm/page_alloc.c:1302 [inline]
> free_unref_page_prepare+0x62e/0xcb0 mm/page_alloc.c:2564
> free_unref_page+0x33/0x370 mm/page_alloc.c:2659
Huh, why did our page get vfree'd, when it was kmalloc'd? Maybe the
memory was reused multiple times before generating this report...?
> vfree+0x180/0x7e0 mm/vmalloc.c:2798
> delayed_vfree_work+0x57/0x70 mm/vmalloc.c:2719
> process_one_work+0x99a/0x15e0 kernel/workqueue.c:2405
> worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
> kthread+0x344/0x440 kernel/kthread.c:379
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>
> Memory state around the buggy address:
> ffff88801e78b780: 00 fc fc fc fc fa fc fc fc fc fa fc fc fc fc fa
> ffff88801e78b800: fc fc fc fc 00 fc fc fc fc fa fc fc fc fc fa fc
> >ffff88801e78b880: fc fc fc fa fc fc fc fc 00 fc fc fc fc 00 fc fc
> ^
> ffff88801e78b900: fc fc 00 fc fc fc fc fa fc fc fc fc 00 fc fc fc
> ffff88801e78b980: fc 00 fc fc fc fc fa fc fc fc fc 00 fc fc fc fc
> ==================================================================
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the bug is already fixed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to change bug's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the bug is a duplicate of another bug, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature