Re: [BUG] kmsan: instrumentation recursion problems

From: Marco Elver
Date: Fri Mar 08 2024 - 04:40:02 EST


On Fri, 8 Mar 2024 at 05:36, 'Changbin Du' via kasan-dev
<kasan-dev@xxxxxxxxxxxxxxxx> wrote:
>
> Hey, folks,
> I found two instrumentation recursion issues on mainline kernel.
>
> 1. recur on preempt count.
> __msan_metadata_ptr_for_load_4() -> kmsan_virt_addr_valid() -> preempt_disable() -> __msan_metadata_ptr_for_load_4()
>
> 2. recur in lockdep and rcu
> __msan_metadata_ptr_for_load_4() -> kmsan_virt_addr_valid() -> pfn_valid() -> rcu_read_lock_sched() -> lock_acquire() -> rcu_is_watching() -> __msan_metadata_ptr_for_load_8()
>
>
> Here is an unofficial fix, I don't know if it will generate false reports.
>
> $ git show
> commit 7f0120b621c1cbb667822b0f7eb89f3c25868509 (HEAD -> master)
> Author: Changbin Du <changbin.du@xxxxxxxxxx>
> Date: Fri Mar 8 20:21:48 2024 +0800
>
> kmsan: fix instrumentation recursions
>
> Signed-off-by: Changbin Du <changbin.du@xxxxxxxxxx>
>
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index 0db4093d17b8..ea925731fa40 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -7,6 +7,7 @@ obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
>
> # Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
> KCSAN_SANITIZE_lockdep.o := n
> +KMSAN_SANITIZE_lockdep.o := n

This does not result in false positives?

Does
KMSAN_ENABLE_CHECKS_lockdep.o := n
work as well? If it does, that is preferred because it makes sure
there are no false positives if the lockdep code unpoisons data that
is passed and used outside lockdep.

lockdep has a serious impact on performance, and not sanitizing it
with KMSAN is probably a reasonable performance trade-off.

> ifdef CONFIG_FUNCTION_TRACER
> CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b2bccfd37c38..8935cc866e2d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -692,7 +692,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
> * Make notrace because it can be called by the internal functions of
> * ftrace, and making this notrace removes unnecessary recursion calls.
> */
> -notrace bool rcu_is_watching(void)
> +notrace __no_sanitize_memory bool rcu_is_watching(void)

For all of these, does __no_kmsan_checks instead of __no_sanitize_memory work?
Again, __no_kmsan_checks (function-only counterpart to
KMSAN_ENABLE_CHECKS_.... := n) is preferred if it works as it avoids
any potential false positives that would be introduced by not
instrumenting.

> {
> bool ret;
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9116bcc90346..33aa4df8fd82 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5848,7 +5848,7 @@ static inline void preempt_latency_start(int val)
> }
> }
>
> -void preempt_count_add(int val)
> +void __no_sanitize_memory preempt_count_add(int val)
> {
> #ifdef CONFIG_DEBUG_PREEMPT
> /*
> @@ -5880,7 +5880,7 @@ static inline void preempt_latency_stop(int val)
> trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
> }
>
> -void preempt_count_sub(int val)
> +void __no_sanitize_memory preempt_count_sub(int val)
> {
> #ifdef CONFIG_DEBUG_PREEMPT
>
>
> --
> Cheers,
> Changbin Du