Re: kernel BUG in set_state_bits

From: David Sterba
Date: Mon Aug 14 2023 - 15:20:28 EST


On Mon, Aug 14, 2023 at 05:31:41PM +0800, Qu Wenruo wrote:
> On 2023/8/14 14:23, Yikebaer Aizezi wrote:
> > Hello,
> >
> > When using Healer to fuzz the Linux-6.5-rc5, the following crash
> > was triggered.
> >
> > HEAD commit: 52a93d39b17dc7eb98b6aa3edb93943248e03b2f (tag: v6.5-rc5)
> > git tree: upstream
> >
> > console output:
> > https://drive.google.com/file/d/1KuE7x7TW_pt_aNWWr2GAdehfYixsgeOO/view?usp=drive_link
> > kernel config:https://drive.google.com/file/d/1b_em6R2Zl98np83b818BzE1FrxbiaGuh/view?usp=drive_link
> > C reproducer:https://drive.google.com/file/d/1HlzFbWr3wqzlLi8I2_ZCQumS71WDLXj1/view?usp=drive_link
> > Syzlang reproducer:
> > https://drive.google.com/file/d/1Bu70LrWxOzsbkilELLuxo8VnjcAFiH1Y/view?usp=drive_link
> >
> > If you fix this issue, please add the following tag to the commit:
> > Reported-by: Yikebaer Aizezi <yikebaer61@xxxxxxxxx>
> >
> >
> > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=8428 'syz-executor'
> > loop1: detected capacity change from 0 to 32768
> > BTRFS: device fsid 84eb0a0b-d357-4bc1-8741-9d3223c15974 devid 1
> > transid 7 /dev/loop1 scanned by syz-executor (8428)
> > BTRFS info (device loop1): using xxhash64 (xxhash64-generic) checksum algorithm
> > BTRFS info (device loop1): disk space caching is enabled
> > BTRFS info (device loop1): enabling ssd optimizations
> > BTRFS info (device loop1): auto enabling async discard
> > FAULT_INJECTION: forcing a failure.
> > name failslab, interval 1, probability 0, space 0, times 1
> > CPU: 0 PID: 8428 Comm: syz-executor Not tainted 6.5.0-rc5 #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0x132/0x150 lib/dump_stack.c:106
> > fail_dump lib/fault-inject.c:52 [inline]
> > should_fail_ex+0x49f/0x5b0 lib/fault-inject.c:153
> > should_failslab+0x5/0x10 mm/slab_common.c:1471
> > slab_pre_alloc_hook mm/slab.h:711 [inline]
> > slab_alloc_node mm/slub.c:3452 [inline]
> > __kmem_cache_alloc_node+0x61/0x350 mm/slub.c:3509
> > kmalloc_trace+0x22/0xd0 mm/slab_common.c:1076
> > kmalloc include/linux/slab.h:582 [inline]
> > ulist_add_merge fs/btrfs/ulist.c:210 [inline]
> > ulist_add_merge+0x16f/0x660 fs/btrfs/ulist.c:198
> > add_extent_changeset fs/btrfs/extent-io-tree.c:191 [inline]
>
> If you checked the call site, it is doing GFP_ATOMIC allocation inside a
> critical section.
>
> Doing such error injection without any clue is not really helping here.
> You can even inject error to NOFAIL call sites, and everyone would not
> really treat it serious.
>
> IIRC even syzbot is no longer reporting errors with blind error
> injection anymore.

Error injection makes sense for realistic errors that are hard to hit,
the memory allocation failure injected in this case is possible but not
realistic. Fixing it is desirable but otherwise has low priority.