Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1

From: Frederick Lawler
Date: Wed Aug 30 2023 - 16:30:36 EST


Hi Matthew,

On Fri, Aug 04, 2023 at 11:57:22AM -0500, Frederick Lawler wrote:
> Hi Matthew,
>
> On Thu, Jul 27, 2023 at 01:27:56PM +0100, Matthew Wilcox wrote:
> > On Thu, Jul 27, 2023 at 11:25:33AM +0100, Daniel Dao wrote:
> > > On Thu, Jul 27, 2023 at 4:27 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Jul 21, 2023 at 11:49:04AM +0100, Daniel Dao wrote:
> > > > > We do not have a reproducer yet, but we now have more debugging data
> > > > > which hopefully
> > > > > should help narrow this down. Details as followed:
> > > > >
> > > > > 1. Kernel NULL pointer deferencences in __filemap_get_folio
> > > > >
> > > > > This happened on a few different hosts, with a few different repeated addresses.
> > > > > The addresses are 0000000000000036, 0000000000000076,
> > > > > 00000000000000f6. This looks
> > > > > like the xarray is corrupted and we were trying to do some work on a
> > > > > sibling entry.
> > > >
> > > > I think I have a fix for this one. Please try the attached.
> > >
> > > For some reason I do not see the attached patch. Can you resend it, or
> > > is it the same
> > > one as in https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 ?
> >
> > Yes, that's the one, sorry.
>
> I setup a kernel with this patch to deploy out. It'll take some time to
> see any results from that. I did run your multiorder.c changes with/without
> the change to lib/xarray.c and that seemed to work as intended. I didn't see
> any regressions across multiple seeds with our kernel config.
>
> Fred

We deployed out the xarray lib fix to our fleet and didn't notice any more
issues cropping up wrt this error among other oddities. LGTM

Fred