Re: [GIT PULL] RCU changes for v6.7

From: Paul E. McKenney
Date: Wed Nov 01 2023 - 13:13:12 EST


On Tue, Oct 31, 2023 at 06:07:57PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 31, 2023 at 01:06:44PM -1000, Linus Torvalds wrote:

[ . . . ]

> > I really think that we should *never* have any kind of notifiers for
> > kernel bugs. They cause problems. The *one* exception is an actual
> > honest-to-goodness kernel debugger, and then it should literally
> > *only* be the debugger that can register a notifier, so that you are
> > *never* in the situation that a kernel without a debugger will just
> > hang because of some bogus debug notifier.

Here you might have been suggesting that I use gdb and just set a
breakpoint in check_cpu_stall(), and then use gdb commands to read out
the state. And yes, this work well in some situations. In fact, there
is a --gdb parameter to the rcutorture scripting for just this purpose.

Except that I normally run a few hundred rcutorture guest OSes spread
across 20 systems, and sometimes more than a thousand guest OSes across
50 systems for hard-to-reproduce bugs. In my experience, managing that
many remote gdb sessions is cranky and unreliable, which is not helpful
when debugging. Writing a few tens of lines of C code in the kernel is
much simpler and more reliable.

Assuming of course that I avoid the traps you point out. Which I have
done thus far. (Famous last words...)

Thanx, Paul