Re: [PATCH] jfs: fix shift-out-of-bounds in dbJoin

From: Matthew Wilcox
Date: Mon Jan 29 2024 - 17:13:50 EST


On Mon, Jan 29, 2024 at 03:17:27PM -0600, Dave Kleikamp wrote:
> On 1/29/24 12:29PM, Matthew Wilcox wrote:
> > On Mon, Jan 29, 2024 at 09:00:56AM -0600, Dave Kleikamp wrote:
> > > On 1/29/24 8:55AM, Matthew Wilcox wrote:
> > > > On Mon, Jan 29, 2024 at 08:39:18AM -0600, Dave Kleikamp wrote:
> > > > > On 1/28/24 2:49PM, Matthew Wilcox wrote:
> > > > > > On Wed, Oct 11, 2023 at 08:09:37PM +0530, Manas Ghandat wrote:
> > > > > > > Currently while joining the leaf in a buddy system there is shift out
> > > > > > > of bound error in calculation of BUDSIZE. Added the required check
> > > > > > > to the BUDSIZE and fixed the documentation as well.
> > > > > >
> > > > > > This patch causes xfstests to fail frequently. The one this trace is
> > > > > > from was generic/074.
> > > > >
> > > > > Thanks for catching this. The sanity test is not right, so we need to revert
> > > > > that one.
> > > >
> > > > Unfortunately, my overnight test run with this patch reverted crashed
> > > > again with the same signature. I also reverted the parent commit,
> > > > and when that crashed I also reverted the parent of that. Which also
> > > > crashed.
> > > >
> > > > So maybe there's something else that makes this unstable. Or maybe my
> > > > bisect went wrong. Or _something_. Anyway, I'm going to spend much of
> > > > today hammering on generic/074 with various kernel versions and see what
> > > > I can deduce.
> > > >
> > > > So far I see no evidence that v6.7 crashes with g/074. And I know that
> > > > next-20240125 does crash with g/074. I'm pretty sure that v6.8-rc1 also
> > > > crashes with g/074, but will confirm that.
> > >
> > > I'll try to beat on it too and see what I find.
> > >
> > > Sasha, maybe hold up on to all the jfs patches for the time being.
> >
> > I have it reproducing easily on cca974daeb6c. I ran it a lot on
> > e0e1958f4c36 and have not reproduced it. So I'm going back to my
> > earlier assertion that cca974daeb6c is bad. Now, maybe other commits
> > are also bad?
>
> I was able to reproduce it too, but not after reverting that one. I believe
> it is the only one causing problems.
>
> I only asked Sasha to hold the other ones as a precaution until we were more
> confident that this one was the problem.

I can't reproduce any problem with v6.8-rc1 + this one reverted.
So I'm not sure what my overnight soak test found. I'll try a few other
things ...