Re: [PATCH] Revert "nios2: Convert __pte_free_tlb() to use ptdescs"

From: Paul E. McKenney
Date: Tue Jun 27 2023 - 19:41:43 EST


On Tue, Jun 27, 2023 at 03:35:45PM -0700, Linus Torvalds wrote:
> On Tue, 27 Jun 2023 at 15:14, Dinh Nguyen <dinguyen@xxxxxxxxxx> wrote:
> >
> > This reverts commit 6ebe94baa2b9ddf3ccbb7f94df6ab26234532734.
> >
> > The patch "nios2: Convert __pte_free_tlb() to use ptdescs" was supposed
> > to go together with a patchset that Vishal Moola had planned taking it
> > through the mm tree. By just having this patch, all NIOS2 builds are
> > broken.
>
> This is now at least the third time just this merge window where some
> base tree was broken, and people thought that linux-next is some kind
> of testing ground for it all.
>
> NO!
>
> Linux-next is indeed for testing, and for finding situations where
> there are interactions between different trees.
>
> But linux-next is *not* a replacement for "this tree has to work on
> its own". THAT testing needs to be done independently, and *before* a
> tree hits linux-next.
>
> It is *NOT* ok to say "this will work in combination with that other
> tree". EVERY SINGLE TREE needs to work on its own, because otherwise
> you cannot bisect the end result sanely.
>
> We apparently had the NIOS2 tree being broken. And the RCU tree was
> broken. And the KUnit tree was broken.
>
> In all those cases, the base tree did not compile properly on its own,
> and linux-next "magically fixed" it by either having Stephen Rothwell
> literally fix the build breakage by hand, or by having some other tree
> hide the problem.
>
> This is very much not ok.
>
> I'm not sure why it happened so much this release, but this needs to
> stop. People need to realize that you can't just throw shit at the
> wall and see if it sticks. You need to test your own trees *first*,
> and *independently* of other peoples trees.
>
> Then, if you have done basic testing, you can then have it in
> linux-next and that hopefully then finds any issues with bad
> interactions with other trees, and maybe also ends up getting more
> coverage testing on odd architectures and with odd configurations.
>
> But linux-next must not in *any* way be a replacement for doing basic
> testing on your own tree first.

On the off-chance that it helps someone else avoid my stupid mistakes,
here is exactly how I messed this up so badly:

1. This API-name-change series went well, except for the usual
lagging changes. This *should* not be a problem, as you
simply leave the old API in however long it takes for the
change to get in.

2. At some point -next was a single-argument kfree_rcu()-free zone.
So I queued the offending commit on the -rcu tree's rcu/next
branch, followed by a revert for my own testing. The idea was
to make new uses fail in -next testing.

So far, so good.

3. I noticed that -next was now free of kfree_rcu() calls.

At this point, I made three stupid mistakes:

a. I failed to wait for mainline itself to be free of the
single-argument kfree_rcu(), thus pulling the offending
single-argument kfree_rcu() removal commit into my pull
request a merge window too soon. This is of course
especially stupid since I tend to send you the RCU pull
request early.

b. I failed to identify exactly which -next commit eliminated
single-argument kfree_rcu(). Had I done so, I would
have seen that this was in fact Stephen's rcu/next
merge commit, which was never going to go to mainline.

c. Worst yet, out of force of habit, I left the revert
from #2 above in my testing, thus failing to see the
-rcu failure due to that remaining single-argument
kfree_rcu() call.

So a combination of three stupid mistakes on my part made the RCU
happen.

As you say, testing *exactly* the commit heading up the pull request
merged with your master branch would have spotted this, and I will
of course make sure that I do this in the future.

And again, please accept my apologies for this mess.

Thanx, Paul