Re: [syzbot] WARNING in iov_iter_revert (3)

From: Theodore Ts'o
Date: Tue Nov 29 2022 - 10:55:29 EST


On Tue, Nov 29, 2022 at 04:04:35AM +0000, Al Viro wrote:
> On Mon, Nov 28, 2022 at 02:57:49PM -0800, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
>
> [snip]
>
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17219fbb880000
>
> "syz_mount_image$ntfs3(" followed by arseloads of garbage. And the thing
> conspiciously missing? Why, any ntfs3 maintainers in Cc... Or lists,
> for that matter...
>
> > generic_file_read_iter+0x3d4/0x540 mm/filemap.c:2804
> > do_iter_read+0x6e3/0xc10 fs/read_write.c:796
> > vfs_readv fs/read_write.c:916 [inline]
> > do_preadv+0x1f4/0x330 fs/read_write.c:1008
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> At a guess - something's screwed in ntfs3 ->direct_IO() (return value, most
> likely). And something's screwed in syzbot. If you are fuzzing some
> filesystem, YOU REALLY OUGHT TO CC THE MAINTAINERS OF THAT FILESYSTEM.
> Even if nothing in the stack trace happens to be in that fs.

The scheme which syzbot appears to use involves looking at the symbol
in EIP from the stack trace to determine who to CC. This... mostly
works, but occasionally results in hilarity.

For example, there was the time when the fuzzing program fed some
other file system (f2fs, as I recall) several hundred invalid file
systems, and then for some reason it fed ext4 an invalid file system,
and ext4 tripped on an invalid pointer dereference. Of course, just
feeding ext4 the invalid file system had no issues, and a human being
might have intuited that maybe the several hundred invalid f2fs file
systems triggered some kind of memory corruption which ext4 then
tripped across ---- but since the EIP was in the ext4 file system, the
ext4 maintainers got cc'ed, and if you look in the dashboard, it just
shows an ext4 symbol, so it's unlikely the f2fs developers would ever
have discovered it on their own. (I did cc it to them, but they
weren't able to get to it immediately, and it'll be hard to find it
again, since we don't have a bug tracking system and there's no way to
set some kind of "subsystem really at fault" state in the Syzkaller
dashboard.)

> Folks, it's that simple - "our bot needs to remember that fuzzing $FS
> automatically puts maintainers of $FS into the set of people we need to Cc"
> vs. "maintainers of each filesystem need to dig into every syzbot posting
> on fsdevel (and follow links, no less) to check if their fs might be
> involved". If you can't be bothered to take care of the former, why
> would you expect $BIGNUM people to bother with the latter, again and
> again and again?

The strength and weakness of syzkaller is that it will combine fuzzing
with, say, setting up and tearing down a gazllion wireguard tunnels,
or some other random set of system calls. Sometimes that finds a real
bug. Other times, for some strange reason, the syzkaller minimizer
can't figure out that the extraneous noise in setting up and tearing
down the network connections is pointless noise, except that on the
specific hardware/VM used by syzkaller, it helps make it easier to
trigger a timing-related bug --- but $DEITY help you if you try to
reproduce on anything other than the specific VM used by the syzkaller
bug.

And then, of course, there are cases where the reproducer is only
doing one thing, such as say messing with ntfs3, and the syzbot
*should* be able to figure out a better set of maintainers to notify
--- but then, given that the syzbot subjust line/summary is something
generic, such as iov_iter_XXX, and there's no way to set up an
affected subsystem state in the dashboard, good luck having anyone
else find it in the syzkaller dashboard later on.

> Fix your bot, already. It's not the first time this had been brought
> to your attention and the problem is still there.

I've brought this to the Syzkaller team's attention multiple times.
Unfortunately, it's not exactly a trivial problem to solve, and other
things have been considered higher priority.

(Hint to the Syzkaller team; if you can prioritize and share a roadmap
with upstream developer vis-a-vis upstream concerns, it might make
some upstream developers a bit more receptive to patches designed to
make life easier for syzkaller to find EVEN MORE FILESYSTEM FUZZING
BUGS, such as [1]. Otherwise, it is perhaps understandable why some
might consider this more of a threat rather than a benefit... Note:
[1] doesn't make a difference for ext4 either way, since metadata
checksums is a file system feature that can be enabled or disabled at
mkfs time; this patch series is about finding more file system bugs
for file systems which don't make disabling checksum to be an option,
such as XFS.)

[1] https://lore.kernel.org/all/20221014084837.1787196-1-hrkanabar@xxxxxxxxx/