Re: [GIT PULL] RCU changes for v5.10

From: Paul E. McKenney
Date: Mon Oct 12 2020 - 19:54:29 EST


On Mon, Oct 12, 2020 at 02:59:41PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
> > is safe to invoke the memory allocator.
>
> So in what situation is RCU called from random contexts that it can't even tell?

In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
not its caller holds a raw spinlock, which some callers do. And if its
caller holds a raw spinlock, then RCU cannot invoke the memory allocator
because the allocator acquires non-raw spinlocks, which in turn results
in lockdep splats. Making CONFIG_PREEMPT_COUNT unconditional allows
RCU to make this determination.

Please note that RCU always provides a fallback for memory-allocation
failure, but such failure needs to be rare, at least in non-OOM
situations.

The alternatives to this approach are:

1. Lockless memory allocation, which was provided by an earlier
patch series. Again, the relevant maintainers are not happy
with this approach.

2. Defer memory allocation to a clean environment. However,
even softirq handlers are not clean enough, so this approach
incurs a full scheduling delay. And this delay is incurred
unconditionally in kernels built with CONFIG_PREEMPT_COUNT=n,
even if the system has memory coming out of its ears, and even
if RCU's caller happens to be a clean environment.

3. A long and sad litany of subtly broken approaches.

> > But either way, please let me know how you would like us to proceed.
>
> Well, AT A MINIMUM, the pull request should damn well have made it
> 1000% clear that this removes a case that has existed for decades, and
> that potentially makes a difference for small kernels in particular.

Got it, thank you.

> In fact, my personal config option - still to this day - is
> CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running,
> CONFIG_PREEMPT_COUNT isn't actually set.
>
> Because honestly, the code generation of some core code looks better
> that way (in places where I've historically looked at things), and the
> latency arguments against it simply aren't relevant when you have 8
> cores or more.
>
> So i don't think that "make preempt count unconditional" is some small
> meaningless detail.

Understood and agreed. And to take your point one step further, not
just CONFIG_PREEMPT_VOLUNTARY but also CONFIG_PREEMPT_NONE is also in
extremely heavy use, including by my employer.

And understood on kernel text size. Raw performance is a different story:
Even microbenchmarks didn't show statistically significant performance
change from CONFIG_PREEMPT_COUNT=n, and system-level benchmarks showed no
difference whatsoever.

So would it help if CONFIG_PREEMPT_COUNT=n became unconditional only for
CONFIG_SMP=y kernels? RCU does have other options for CONFIG_SMP=n. Or
do your small-kernel concerns extend beyond single-CPU microcontrollers?

> What is so magical about RCU allocating memory? I assume it's some
> debug case? Why does that debug case then have a
>
> select PREEMPT_COUNT
>
> like is done for PROVE_LOCKING?

Sadly, no, it is not just a debug case.

This memory allocation enables a cache-locality optimization to
callback processing that reduces cache misses. This optimization
is currently implemented only for kvfree_rcu(), where it reduces
callback-invocation-time cache misses by a factor of eight on typical
x86 systems, which produces decent system-level benefits. So it would
be good to also apply this optimization to call_rcu().

> > I based my
> > optimism in part on your not having complained about either the patch
> > series or the pull request, both of which I CCed you on:
>
> I had already raised my concerns when that patch series was posted by
> Thomas originally. I did not feel like I needed to re-raise them just
> because the series got reposted by somebody else.

OK, I did not know, but I do know it now!

Thanx, Paul