Re: [syzbot] Monthly xfs report

From: Dave Chinner
Date: Wed Apr 12 2023 - 17:54:25 EST


On Mon, Apr 10, 2023 at 09:35:17PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 11, 2023 at 11:35:12AM +1000, Dave Chinner wrote:
> > On Thu, Mar 30, 2023 at 02:58:43AM -0700, syzbot wrote:
> >
> > > 13 Yes general protection fault in __xfs_free_extent
> > > https://syzkaller.appspot.com/bug?extid=bfbc1eecdfb9b10e5792
> >
> > Growfs issue. Looks like a NULL pag, which means the fsbno passed
> > to __xfs_free_extent() is invalid. Without looking further, this
> > looks like it's a corrupt AGF length or superblock size and this has
> > resulted in the calculated fsbno starting beyond the end of the last
> > AG that we are about to grow. That means the agno is beyond EOFS,
> > xfs_perag_get(agno) ends up NULL, and __xfs_free_extent() goes
> > splat. Likely requires corruption to trigger.
> >
> > Low priority, low severity.
>
> I've been wondering for quite a while if the code that creates those
> defer items ought to be shutting down the fs if they can't get a perag
> to stuff in the intent. xfs_perag_intent_get seems like a reasonable
> place to shut down the fs with a corruption warning if someone feeds in
> a totally garbage fsblock range.

You know, I think this might be the same as thex case below where
a bogus AGF field is getting past the verifiers in recovery...

>
> > > 5 Yes KASAN: use-after-free Read in xfs_btree_lookup_get_block
> > > https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e
> >
> > Recovery of reflink COW extents, we have a corrupted journal
> >
> > [ 52.495566][ T5067] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
> > [ 52.599681][ T5067] XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
> > [ 52.636680][ T5067] XFS (loop0): Starting recovery (logdev: internal)
> >
> > And then it looks to have a UAF on the refcountbt cursor that is
> > first initialised in xfs_refcount_recover_cow_leftovers(). Likely
> > tripping over a corrupted refcount btree of some kind. Probably one
> > for Darrick to look into.
>
> Somehow the bogus refcount level field in the AGF is getting past the
> verifiers. I'll look into this later.

... because like this one, it seems to require corruption getting
deep into the modification operation without being detected.

As for shutdown when a perag cannot be obtained by defer items, I'm
hoping that the perag get operations slowly disappear from those as
we slowly move the perag references higher up the heirarchy. The
perag should not go away in the middle of a defer chain, so I don't
think we should ever get a NULL from a lookup except in the case of
buggy code....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx