Re: [PATCH] ext4: fix underflow in group bitmap calculation

From: Theodore Ts'o
Date: Thu Dec 22 2022 - 12:42:17 EST


On Thu, Dec 22, 2022 at 10:02:44AM +0800, Jun Nie wrote:
> There is case that s_first_data_block is not 0 and block nr is smaller than
> s_first_data_block when calculating group bitmap during allocation. This
> underflow make index exceed es->s_groups_count in ext4_get_group_info()
> and trigger the BUG_ON.
>
> Fix it with protection of underflow.

When was this happening, and why? If blocknr is less than
s_first_data_block, this is either a insufficient input validation,
insufficient validation to detection file system corruption. or some
other kernel bug.

Looking quickly at the code and the repro, it appears that issue is
that FS_IOC_GETFSMAP is getting passed a stating physical block of 0
in fmh_keys[0] when on a file system with a blocksize of 1k (in which
case s_first_data_block is 1). It's unclear to me what
FS_IOC_GETFSMAP should *do* when passed a value which requests that it
provide a mapping for a block which is out of bounds (either too big,
or too small)?. Should it return an error? Should it simply not
return a mapping? The map page for ioctl_getfsmap() doesn't shed any
light on this question.

Darrick, you designed the interface and wrote most of fs/ext4/fsmap.c.
Can you let us know what is supposed to happen in this case? Many
thanks!!

> Fixes: 72b64b594081ef ("ext4 uninline ext4_get_group_no_and_offset()")

This makes ***no*** sense; the commit in question is from 2006, which
means that in some jourisdictions it's old enough to drive a car. :-)
Futhermore, all it does is move the function from an inline function
to a C file (in this case, balloc.c). It also long predates
introduction of FS_IOC_GETFSMAP support, which was in 2017.

I'm guessing you just did a "git blame" and blindly assumed that
whatever commit last touched the C code in question was what
introduced the problem?

Anyway, please try to understand what is going on instead of doing the
moral equivalent of taking a sledgehammer to the code until the
reproducer stops triggering a BUG. It's not enough to shut up the
reproducer; you should understand what is happening, and why, and then
strive to find the best fix to the problem. Papering over problems in
the end will result in more fragile code, and the goal of syzkaller is
to improve kernel quality. But syzkaller is just a tool and used
wrongly, it can have the opposite effect.

Regards,

- Ted