Re: [PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC

From: Joel Fernandes
Date: Wed Jul 12 2023 - 13:03:02 EST


On Tue, Jul 11, 2023 at 7:38 PM Sandeep Dhavale <dhavale@xxxxxxxxxx> wrote:
>
> Currently if CONFIG_DEBUG_LOCK_ALLOC is not set
>
> - rcu_read_lock_held() always returns 1
> - rcu_read_lock_any_held() may return 0 with CONFIG_PREEMPT_RCU
>
> This is inconsistent and it was discovered when trying a fix
> for problem reported [1] with CONFIG_DEBUG_LOCK_ALLOC is not
> set and CONFIG_PREEMPT_RCU is enabled. Gist of the problem is
> that EROFS wants to detect atomic context so it can do inline
> decompression whenever possible, this is important performance
> optimization. It turns out that z_erofs_decompressqueue_endio()
> can be called from blk_mq_flush_plug_list() with rcu lock held
> and hence fix uses rcu_read_lock_any_held() to decide to use
> sync/inline decompression vs async decompression.
>
> As per documentation, we should return lock is held if we aren't
> certain. But it seems we can improve the checks for if the lock
> is held even if CONFIG_DEBUG_LOCK_ALLOC is not set instead of
> hardcoding to always return true.
>
> * rcu_read_lock_held()
> - For CONFIG_PREEMPT_RCU using rcu_preempt_depth()
> - using preemptible() (indirectly preempt_count())
>
> * rcu_read_lock_bh_held()
> - For CONFIG_PREEMPT_RT Using in_softirq() (indirectly softirq_cont())
> - using preemptible() (indirectly preempt_count())
>
> Lastly to fix the inconsistency, rcu_read_lock_any_held() is updated
> to use other rcu_read_lock_*_held() checks.
>
> Two of the improved checks are moved to kernel/rcu/update.c because
> rcupdate.h is included from the very low level headers which do not know
> what current (task_struct) is so the macro rcu_preempt_depth() cannot be
> expanded in the rcupdate.h. See the original comment for
> rcu_preempt_depth() in patch at [2] for more information.
>
> [1]
> https://lore.kernel.org/all/20230621220848.3379029-1-dhavale@xxxxxxxxxx/
> [2]
> https://lore.kernel.org/all/1281392111-25060-8-git-send-email-paulmck@xxxxxxxxxxxxxxxxxx/
>
> Reported-by: Will Shiu <Will.Shiu@xxxxxxxxxxxx>
> Signed-off-by: Sandeep Dhavale <dhavale@xxxxxxxxxx>
> ---
> include/linux/rcupdate.h | 12 +++---------
> kernel/rcu/update.c | 21 ++++++++++++++++++++-
> 2 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 5e5f920ade90..0d1d1d8c2360 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -319,14 +319,11 @@ int rcu_read_lock_any_held(void);
> # define rcu_lock_acquire(a) do { } while (0)
> # define rcu_lock_release(a) do { } while (0)
>
> -static inline int rcu_read_lock_held(void)
> -{
> - return 1;
> -}
> +int rcu_read_lock_held(void);
>
> static inline int rcu_read_lock_bh_held(void)
> {
> - return 1;
> + return !preemptible() || in_softirq();
> }
>
> static inline int rcu_read_lock_sched_held(void)
> @@ -334,10 +331,7 @@ static inline int rcu_read_lock_sched_held(void)
> return !preemptible();
> }
>
> -static inline int rcu_read_lock_any_held(void)
> -{
> - return !preemptible();
> -}
> +int rcu_read_lock_any_held(void);
>
> static inline int debug_lockdep_rcu_enabled(void)
> {
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 19bf6fa3ee6a..b34fc5bb96cf 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -390,8 +390,27 @@ int rcu_read_lock_any_held(void)
> }
> EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
>
> -#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> +#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> +int rcu_read_lock_held(void)
> +{
> + if (IS_ENABLED(CONFIG_PREEMPT_RCU))
> + return rcu_preempt_depth();
> + return !preemptible();
> +}
> +EXPORT_SYMBOL_GPL(rcu_read_lock_held);
> +
> +int rcu_read_lock_any_held(void)
> +{
> + if (rcu_read_lock_held() ||
> + rcu_read_lock_bh_held() ||
> + rcu_read_lock_sched_held())
> + return 1;
> + return !preemptible();

Actually even the original code is incorrect (the lockdep version)
since preemptible() cannot be relied upon if CONFIG_PREEMPT_COUNT is
not set. However, that's debug code. In this case, it is a real
user (the fs code). In non-preemptible kernels, we are always in an
RCU-sched section. So you can't really see if anyone called your code
under rcu_read_lock(). The rcu_read_lock/unlock() would be getting
NOOPed. In such a kernel, it will always tell your code it is in an
RCU reader. That's not ideal for that erofs code?

Also, per that erofs code:
/* Use (kthread_)work and sync decompression for atomic contexts only */
if (!in_task() || irqs_disabled() || rcu_read_lock_any_held()) {

I guess you are also assuming that rcu_read_lock_any_held() tells you
something about atomicity but strictly speaking, it doesn't because
preemptible RCU readers are preemptible. You can't block but
preemption is possible so it is not "atomic". Maybe you meant "cannot
block"?

As such this patch looks correct to me, one thing I noticed is that
you can check rcu_is_watching() like the lockdep-enabled code does.
That will tell you also if a reader-section is possible because in
extended-quiescent-states, RCU readers should be non-existent or
that's a bug.

Could you also verify that this patch does not cause bloating of the
kernel if lockdep is disabled?

thanks,

- Joel