Re: [PATCH] rcu: Dump memory object info if callback is invalid

From: Leizhen (ThunderTown)
Date: Fri Nov 11 2022 - 21:22:11 EST




On 2022/11/12 2:35, Paul E. McKenney wrote:
> On Fri, Nov 11, 2022 at 06:04:26PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/11/11 15:45, Paul E. McKenney wrote:
>>> On Sat, Nov 05, 2022 at 10:35:19AM +0800, Zhen Lei wrote:
>>>> The callback rhp->func becomes NULL is usually caused by use-after-free.
>>>> So the information about 'rhp' is very useful. Unfortunately, nothing is
>>>> printed at the moment. Look at the panic output below, if no vmcore is
>>>> generated, there is almost no way to analyze it except to know that the
>>>> bug exists.
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address 0
>>>> ... ...
>>>> PC is at 0x0
>>>> LR is at rcu_do_batch+0x1c0/0x3b8
>>>> ... ...
>>>> (rcu_do_batch) from (rcu_core+0x1d4/0x284)
>>>> (rcu_core) from (__do_softirq+0x24c/0x344)
>>>> (__do_softirq) from (__irq_exit_rcu+0x64/0x108)
>>>> (__irq_exit_rcu) from (irq_exit+0x8/0x10)
>>>> (irq_exit) from (__handle_domain_irq+0x74/0x9c)
>>>> (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
>>>> (gic_handle_irq) from (__irq_svc+0x5c/0x94)
>>>> (__irq_svc) from (arch_cpu_idle+0x20/0x3c)
>>>> (arch_cpu_idle) from (default_idle_call+0x4c/0x78)
>>>> (default_idle_call) from (do_idle+0xf8/0x150)
>>>> (do_idle) from (cpu_startup_entry+0x18/0x20)
>>>> (cpu_startup_entry) from (0xc01530)
>>>>
>>>> So add mem_dump_obj(rhp) to output some information, for example:
>>>> slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
>>>>
>>>> Now we know the size of the memory block and the offset of rcu_head. Then
>>>> we can check the code. It's going to be slow and tiring, but it's better
>>>> than no way to start.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
>>>
>>> I have pulled this in with the usual wordsmithing (please check!)
>>> for review and testing, thank you!
>>
>> Great! Thanks. Provides a lot of valuable debugging method information.
>>
>> In the following two lines, there are a few extra spaces after the dot.
>> I will delete it in v2.
>>
>> rhp->func to be set to NULL. This defeats the debugging prints used by
>> locate the problem. If the problem is reproducible, additional slab
>
> Please do adjust my wordsmithing as required.
>
>>> Questions include "Is 0x3 correct for functions compiled with all
>>> supported compiler options on all architectures on which the Linux
>>
>> Sorry, I found it possible that it wouldn't work on x86. Although I had
>> no problems booting up on x86 before. I ran a script today and found that
>> there were addresses that were not 4-byte aligned.
>>
>> I'll send v2 on your basis.
>
> x86 can be like that sometimes...
>
> I revert your current patch, and look forward to seeing your v2.
>
>> cat System.map | grep -E ' t | T ' | awk '{print substr($1,length($1),length($1))}' | sort | uniq -c
>> 52521 0
>> 409 1
>> 394 2
>> 417 3
>> 404 4
>> 458 5
>> 405 6
>> 393 7
>> 1205 8
>> 457 9
>> 442 a
>> 435 b
>> 421 c
>> 418 d
>> 421 e
>> 426 f
>
> Indeed, quite a few! Maybe the address check can be arch-specific,
> maybe using IS_ENABLED()?

There are more precise, more effective, but relatively time-consuming
functions. Such as is_vmalloc_addr(), virt_addr_valid().

>
>>> kernel runs?", "Is this added information useful often enough for
>>> this to be pushed to mainline?", and so on.
>
> And another question is "Should this be default?" There may be concerns
> with callback-invocation throughput during callback-flooding events.

The overhead of the if statement is small compared with that of
memory freeing. However, there is a low probability that the callback
fails, it's also wise to set it to non-default.

>
>> I originally wanted to add a member in struct rcu_head and backup
>> 'func' to the previous node. This way, when the error is detected,
>> the hook function can be printed out. This will help us quickly
>> find the user of the invalid rhp. However, the size of the struct
>> page is limited and cannot be expanded.
>
> Although that information could be clobbered just as easily as could
> the ->func value, right?

There is a low probability that the previous node and the current node
are in the same memory area. Therefore, it is relatively reliable to
back up ->func to the previous node. However, this method can be used
temporarily to locate the problem. After all, the impact is too great.

>
>> Further more, we can dump the contents of mem object.
>>
>> I have a problem that has not been resolved and has not reproduced.
>> The surrounding contents of 'rhp' have been dumped, as below.
>> You can highlight 00000024 and 00000030, you'll see that this is a
>> fixed 80-bytes structure. There is also a bidirectional linked list
>> in the structure. If I have mem_dump_obj(rhp) information, I can
>> narrow it down considerably.
>>
>> [20220928044206]5390: 00000024 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> [20220928044206]53b0: 00000000 00000000 00000000 00000000 cfa4d580 ffff4596 00000000 00000000
>> [20220928044206]53d0: 7438f148 ffff4596 7438f148 ffff4596 00000024 00000000 0b828cfa 0f00aaf4
>> [20220928044206]53f0: 00000000 00000000 00000000 00000000 496653c0 ffff4596 00000000 00000000
>> [20220928044206]5410: 00000000 00000000 00000000 00000000 ae0769e0 ffff4596 ae0769e0 ffff4596
>> [20220928044206]5430: 00000030 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> [20220928044206]5450: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> [20220928044206]5470: ae076988 ffff4596 ae076988 ffff4596 00000024 00000000 00000000 00000000
>
> OK, I consider the "is this useful" question to be answered in the
> affirmative.
>
> Thanx, Paul
>
>>>> ---
>>>> kernel/rcu/rcu.h | 7 +++++++
>>>> kernel/rcu/srcutiny.c | 1 +
>>>> kernel/rcu/srcutree.c | 1 +
>>>> kernel/rcu/tasks.h | 1 +
>>>> kernel/rcu/tiny.c | 1 +
>>>> kernel/rcu/tree.c | 1 +
>>>> 6 files changed, 12 insertions(+)
>>>>
>>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
>>>> index 70c79adfdc7046c..4844dec36bddb48 100644
>>>> --- a/kernel/rcu/rcu.h
>>>> +++ b/kernel/rcu/rcu.h
>>>> @@ -10,6 +10,7 @@
>>>> #ifndef __LINUX_RCU_H
>>>> #define __LINUX_RCU_H
>>>>
>>>> +#include <linux/mm.h>
>>>> #include <trace/events/rcu.h>
>>>>
>>>> /*
>>>> @@ -211,6 +212,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
>>>> }
>>>> #endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
>>>>
>>>> +static inline void debug_rcu_head_callback(struct rcu_head *rhp)
>>>> +{
>>>> + if (unlikely(!rhp->func || (unsigned long)rhp->func & 0x3))
>>>> + mem_dump_obj(rhp);
>>>> +}
>>>> +
>>>> extern int rcu_cpu_stall_suppress_at_boot;
>>>>
>>>> static inline bool rcu_stall_is_suppressed_at_boot(void)
>>>> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
>>>> index 33adafdad261389..5e7f336baa06ae0 100644
>>>> --- a/kernel/rcu/srcutiny.c
>>>> +++ b/kernel/rcu/srcutiny.c
>>>> @@ -138,6 +138,7 @@ void srcu_drive_gp(struct work_struct *wp)
>>>> while (lh) {
>>>> rhp = lh;
>>>> lh = lh->next;
>>>> + debug_rcu_head_callback(rhp);
>>>> local_bh_disable();
>>>> rhp->func(rhp);
>>>> local_bh_enable();
>>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>>>> index ca4b5dcec675bac..294972e66b31863 100644
>>>> --- a/kernel/rcu/srcutree.c
>>>> +++ b/kernel/rcu/srcutree.c
>>>> @@ -1631,6 +1631,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
>>>> rhp = rcu_cblist_dequeue(&ready_cbs);
>>>> for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
>>>> debug_rcu_head_unqueue(rhp);
>>>> + debug_rcu_head_callback(rhp);
>>>> local_bh_disable();
>>>> rhp->func(rhp);
>>>> local_bh_enable();
>>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>>>> index b0b885e071fa8dc..b7f8c67c586cdc4 100644
>>>> --- a/kernel/rcu/tasks.h
>>>> +++ b/kernel/rcu/tasks.h
>>>> @@ -478,6 +478,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
>>>> raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
>>>> len = rcl.len;
>>>> for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>>>> + debug_rcu_head_callback(rhp);
>>>> local_bh_disable();
>>>> rhp->func(rhp);
>>>> local_bh_enable();
>>>> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
>>>> index bb8f7d270f01747..56e9a5d91d97ec5 100644
>>>> --- a/kernel/rcu/tiny.c
>>>> +++ b/kernel/rcu/tiny.c
>>>> @@ -97,6 +97,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
>>>>
>>>> trace_rcu_invoke_callback("", head);
>>>> f = head->func;
>>>> + debug_rcu_head_callback(head);
>>>> WRITE_ONCE(head->func, (rcu_callback_t)0L);
>>>> f(head);
>>>> rcu_lock_release(&rcu_callback_map);
>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>>> index 93c286b98c8f03d..3b93b9df8042a84 100644
>>>> --- a/kernel/rcu/tree.c
>>>> +++ b/kernel/rcu/tree.c
>>>> @@ -2256,6 +2256,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>>>> trace_rcu_invoke_callback(rcu_state.name, rhp);
>>>>
>>>> f = rhp->func;
>>>> + debug_rcu_head_callback(rhp);
>>>> WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
>>>> f(rhp);
>>>>
>>>> --
>>>> 2.25.1
>>>>
>>> .
>>>
>>
>> --
>> Regards,
>> Zhen Lei
> .
>

--
Regards,
Zhen Lei