Re: [PATCH] gcov: clang: fix the buffer overflow issue

From: Mukesh Ojha
Date: Fri Nov 04 2022 - 15:59:04 EST


Hi Nick,

Thanks for looking into this.

On 11/4/2022 11:18 PM, Nick Desaulniers wrote:
On Fri, Nov 4, 2022 at 6:23 AM Mukesh Ojha <quic_mojha@xxxxxxxxxxx> wrote:

Currently, in clang version of gcov code when module is getting removed
gcov_info_add() incorrectly adds the sfn_ptr->counter to all the
dst->functions and it result in the kernel panic in below crash report.
Fix this by properly handling it.

[ 8.899094][ T599] Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000
[ 8.899100][ T599] Mem abort info:
[ 8.899102][ T599] ESR = 0x9600004f
[ 8.899103][ T599] EC = 0x25: DABT (current EL), IL = 32 bits
[ 8.899105][ T599] SET = 0, FnV = 0
[ 8.899107][ T599] EA = 0, S1PTW = 0
[ 8.899108][ T599] FSC = 0x0f: level 3 permission fault
[ 8.899110][ T599] Data abort info:
[ 8.899111][ T599] ISV = 0, ISS = 0x0000004f
[ 8.899113][ T599] CM = 0, WnR = 1
[ 8.899114][ T599] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000ab8de000
[ 8.899116][ T599] [ffffff80461cc000] pgd=18000009ffcde003, p4d=18000009ffcde003, pud=18000009ffcde003, pmd=18000009ffcad003, pte=00600000c61cc787
[ 8.899124][ T599] Internal error: Oops: 9600004f [#1] PREEMPT SMP
[ 8.899265][ T599] Skip md ftrace buffer dump for: 0x1609e0
....
..,
[ 8.899544][ T599] CPU: 7 PID: 599 Comm: modprobe Tainted: G S OE 5.15.41-android13-8-g38e9b1af6bce #1
[ 8.899547][ T599] Hardware name: XXX (DT)
[ 8.899549][ T599] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
[ 8.899551][ T599] pc : gcov_info_add+0x9c/0xb8
[ 8.899557][ T599] lr : gcov_event+0x28c/0x6b8
[ 8.899559][ T599] sp : ffffffc00e733b00
[ 8.899560][ T599] x29: ffffffc00e733b00 x28: ffffffc00e733d30 x27: ffffffe8dc297470
[ 8.899563][ T599] x26: ffffffe8dc297000 x25: ffffffe8dc297000 x24: ffffffe8dc297000
[ 8.899566][ T599] x23: ffffffe8dc0a6200 x22: ffffff880f68bf20 x21: 0000000000000000
[ 8.899569][ T599] x20: ffffff880f68bf00 x19: ffffff8801babc00 x18: ffffffc00d7f9058
[ 8.899572][ T599] x17: 0000000000088793 x16: ffffff80461cbe00 x15: 9100052952800785
[ 8.899575][ T599] x14: 0000000000000200 x13: 0000000000000041 x12: 9100052952800785
[ 8.899577][ T599] x11: ffffffe8dc297000 x10: ffffffe8dc297000 x9 : ffffff80461cbc80
[ 8.899580][ T599] x8 : ffffff8801babe80 x7 : ffffffe8dc2ec000 x6 : ffffffe8dc2ed000
[ 8.899583][ T599] x5 : 000000008020001f x4 : fffffffe2006eae0 x3 : 000000008020001f
[ 8.899586][ T599] x2 : ffffff8027c49200 x1 : ffffff8801babc20 x0 : ffffff80461cb3a0
[ 8.899589][ T599] Call trace:
[ 8.899590][ T599] gcov_info_add+0x9c/0xb8
[ 8.899592][ T599] gcov_module_notifier+0xbc/0x120
[ 8.899595][ T599] blocking_notifier_call_chain+0xa0/0x11c
[ 8.899598][ T599] do_init_module+0x2a8/0x33c
[ 8.899600][ T599] load_module+0x23cc/0x261c
[ 8.899602][ T599] __arm64_sys_finit_module+0x158/0x194
[ 8.899604][ T599] invoke_syscall+0x94/0x2bc
[ 8.899607][ T599] el0_svc_common+0x1d8/0x34c
[ 8.899609][ T599] do_el0_svc+0x40/0x54
[ 8.899611][ T599] el0_svc+0x94/0x2f0
[ 8.899613][ T599] el0t_64_sync_handler+0x88/0xec
[ 8.899615][ T599] el0t_64_sync+0x1b4/0x1b8
[ 8.899618][ T599] Code: f905f56c f86e69ec f86e6a0f 8b0c01ec (f82e6a0c)
[ 8.899620][ T599] ---[ end trace ed5218e9e5b6e2e6 ]---

Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
---
kernel/gcov/clang.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index cbb0bed..0aabb9a 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -271,15 +271,20 @@ int gcov_info_is_compatible(struct gcov_info *info1, struct gcov_info *info2)
*/
void gcov_info_add(struct gcov_info *dst, struct gcov_info *src)
{
- struct gcov_fn_info *dfn_ptr;
- struct gcov_fn_info *sfn_ptr = list_first_entry_or_null(&src->functions,
- struct gcov_fn_info, head);

Hi Mukesh,
Thanks for the report and patch!

Looking closer at the existing implementation, it looks curious to me
that we use list_first_entry_or_null() since that may return NULL,
which we never check for. I'm curious if that's safe to remove?
Probably, since we haven't had any issues reported thus far.

+ struct gcov_fn_info *sfn_ptr;
+ struct gcov_fn_info *dfn_ptr = list_first_entry_or_null(
+ &dst->functions, struct gcov_fn_info, head);

- list_for_each_entry(dfn_ptr, &dst->functions, head) {
+ list_for_each_entry(sfn_ptr, &src->functions, head) {

This seems to be iterating BOTH src and dest, whereas previously we
were only iterating dest AFAICT. Is this correct? Seems to be a
change of behavior, at the least, which seems orthogonal to fixing the
panic.

Can you just check the implementation here once ?

https://elixir.bootlin.com/linux/v6.1-rc3/source/kernel/gcov/gcc_4_7.c#L241

By looking at the above link clang version does not seem to doing right ?


Otherwise it sounds like we could just add NULL ptr checks against
sfn_ptr outside the loop, and against dfn_ptr inside the loop.
Something like this?
```
diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index cbb0bed958ab..5d4cb801aa9c 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -275,10 +275,13 @@ void gcov_info_add(struct gcov_info *dst, struct
gcov_info *src)
struct gcov_fn_info *sfn_ptr = list_first_entry_or_null(&src->functions,
struct gcov_fn_info, head);

- list_for_each_entry(dfn_ptr, &dst->functions, head) {
- u32 i;
+ if (!sfn_ptr)
+ return;

- for (i = 0; i < sfn_ptr->num_counters; i++)
+ list_for_each_entry(dfn_ptr, &dst->functions, head) {
+ if (!dfn_ptr)
+ continue;
+ for (u32 i = 0, e = sfn_ptr->num_counters; i != e; ++i)
dfn_ptr->counters[i] += sfn_ptr->counters[i];
}
}
```
Can you test the above hunk or comment on whether it addresses the issue?


BTW, it just handles NUL pointer issue and not the one which is mentioned here.

"Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000"

-Mukesh


u32 i;

+ if (!dfn_ptr)
+ return;
+
for (i = 0; i < sfn_ptr->num_counters; i++)
dfn_ptr->counters[i] += sfn_ptr->counters[i];
+
+ dfn_ptr = list_next_entry(dfn_ptr, head);
}
}

--
2.7.4