Re: [PATCH] tracing: Optimize event type allocation with IDA

From: Yujie Liu
Date: Tue Nov 22 2022 - 21:43:47 EST


Hi Yejian,

On 11/18/2022 14:41, Zheng Yejian wrote:
On Wed, 16 Nov 2022 23:50:04 +0800, kernel test robot <yujie.liu@xxxxxxxxx> wrote:
[-- Attachment #1: Type: text/plain, Size: 8013 bytes --]

Greeting,

FYI, we noticed BUG:KASAN:use-after-free_in_string due to commit (built with gcc-11):

commit: bb10be779a5fc1147d5765257c64a2fbea9607c5 ("[PATCH] tracing: Optimize event type allocation with IDA")
url: https://github.com/intel-lab-lkp/linux/commits/Zheng-Yejian/tracing-Optimize-event-type-allocation-with-IDA/20221109-112530
base: https://git.kernel.org/cgit/linux/kernel/git/rostedt/linux-trace.git for-next
patch link: https://lore.kernel.org/all/20221109032352.254502-1-zhengyejian1@xxxxxxxxxx/
patch subject: [PATCH] tracing: Optimize event type allocation with IDA

in testcase: kernel-selftests
version: kernel-selftests-x86_64-9313ba54-1_20221017
with following parameters:

group: ftrace

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt

on test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


[ 341.472391][ T5846] BUG: KASAN: use-after-free in string (vsprintf.c:?)
[ 341.478762][ T5846] Read of size 1 at addr ffff88812a630000 by task grep/5846
[ 341.485921][ T5846]
[ 341.488122][ T5846] CPU: 3 PID: 5846 Comm: grep Not tainted 6.0.0-rc7-00021-gbb10be779a5f #1
[ 341.496580][ T5846] Hardware name: HP HP Z238 Microtower Workstation/8183, BIOS N51 Ver. 01.63 10/05/2017
[ 341.506170][ T5846] Call Trace:
[ 341.509326][ T5846] <TASK>
[ 341.512133][ T5846] dump_stack_lvl (??:?)
[ 341.516511][ T5846] print_address_description+0x1f/0x1e0
[ 341.522974][ T5846] print_report.cold (report.c:?)
[ 341.527698][ T5846] ? do_raw_spin_lock (??:?)
[ 341.532596][ T5846] ? string (vsprintf.c:?)
[ 341.536625][ T5846] kasan_report (??:?)
[ 341.540909][ T5846] ? kallsyms_lookup_buildid (kallsyms.c:?)
[ 341.546332][ T5846] ? string (vsprintf.c:?)
[ 341.550362][ T5846] string (vsprintf.c:?)
[ 341.554220][ T5846] ? ip6_addr_string_sa (vsprintf.c:?)
[ 341.559293][ T5846] ? enable_ptr_key_workfn (vsprintf.c:?)
[ 341.564454][ T5846] vsnprintf (??:?)
[ 341.568657][ T5846] ? pointer (??:?)
[ 341.572771][ T5846] ? pointer (??:?)
[ 341.576889][ T5846] seq_buf_vprintf (??:?)
[ 341.581440][ T5846] trace_seq_printf (??:?)
[ 341.586165][ T5846] ? trace_seq_bitmask (??:?)
[ 341.591151][ T5846] ? trace_seq_bitmask (??:?)
[ 341.596136][ T5846] ? memcpy (??:?)
[ 341.599991][ T5846] ? seq_buf_putmem (??:?)
[ 341.604631][ T5846] ? print_type_symbol (??:?)
[ 341.609440][ T5846] print_type_string (??:?)
[ 341.614165][ T5846] ? print_type_symbol (??:?)
[ 341.618976][ T5846] print_kprobe_event (trace_kprobe.c:?)
[ 341.623876][ T5846] print_trace_line (??:?)
[ 341.628600][ T5846] ? tracing_buffers_read (??:?)
[ 341.633844][ T5846] ? trace_hardirqs_on (??:?)
[ 341.638737][ T5846] ? _raw_spin_unlock_irqrestore (??:?)
[ 341.644417][ T5846] ? trace_find_next_entry_inc (??:?)
[ 341.650099][ T5846] s_show (??:?)
[ 341.653865][ T5846] seq_read_iter (??:?)
[ 341.658419][ T5846] ? cp_new_stat (stat.c:?)
[ 341.662882][ T5846] seq_read (??:?)
[ 341.666911][ T5846] ? seq_read_iter (??:?)
[ 341.671729][ T5846] ? fsnotify_perm+0x13b/0x4a0
[ 341.676973][ T5846] vfs_read (??:?)
[ 341.681003][ T5846] ? kernel_read (??:?)
[ 341.685470][ T5846] ? syscall_exit_to_user_mode (??:?)
[ 341.690975][ T5846] ? __fget_light (file.c:?)
[ 341.695438][ T5846] ksys_read (??:?)
[ 341.699467][ T5846] ? __ia32_sys_pwrite64 (??:?)
[ 341.704620][ T5846] ? lockdep_hardirqs_on_prepare (lockdep.c:?)
[ 341.711083][ T5846] ? syscall_enter_from_user_mode (??:?)
[ 341.716853][ T5846] do_syscall_64 (??:?)
[ 341.721143][ T5846] ? lockdep_hardirqs_on_prepare (lockdep.c:?)
[ 341.727609][ T5846] entry_SYSCALL_64_after_hwframe (??:?)
[ 341.733374][ T5846] RIP: 0033:0x7f9da419902d
[ 341.737663][ T5846] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d b6 55 0a 00 e8 99 fe 01 00 66 0f 1f 84 00 00 00 00 00 80 3d b1 25 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
All code
========
0: 31 c0 xor %eax,%eax
2: e9 c6 fe ff ff jmpq 0xfffffffffffffecd
7: 50 push %rax
8: 48 8d 3d b6 55 0a 00 lea 0xa55b6(%rip),%rdi # 0xa55c5
f: e8 99 fe 01 00 callq 0x1fead
14: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
1b: 00 00
1d: 80 3d b1 25 0e 00 00 cmpb $0x0,0xe25b1(%rip) # 0xe25d5
24: 74 17 je 0x3d
26: 31 c0 xor %eax,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 5b ja 0x8d
32: c3 retq
33: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
3a: 00 00 00
3d: 48 rex.W
3e: 83 .byte 0x83
3f: ec in (%dx),%al

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 5b ja 0x63
8: c3 retq
9: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
10: 00 00 00
13: 48 rex.W
14: 83 .byte 0x83
15: ec in (%dx),%al
[ 341.757163][ T5846] RSP: 002b:00007fff7b2e0818 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 341.765446][ T5846] RAX: ffffffffffffffda RBX: 0000000000018000 RCX: 00007f9da419902d
[ 341.773296][ T5846] RDX: 0000000000018000 RSI: 00005612d0f5e000 RDI: 0000000000000005
[ 341.781148][ T5846] RBP: 00005612d0f5e000 R08: 00000000000395e1 R09: 0000000000000000
[ 341.789000][ T5846] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff7b2e08e0
[ 341.796852][ T5846] R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000005
[ 341.804710][ T5846] </TASK>
[ 341.807601][ T5846]
[ 341.809798][ T5846] The buggy address belongs to the physical page:
[ 341.816082][ T5846] page:00000000345a71f1 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12a630
[ 341.826198][ T5846] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 341.833443][ T5846] raw: 0017ffffc0000000 ffffea0005b73208 ffffea0004a98e08 0000000000000000
[ 341.841901][ T5846] raw: 0000000000000000 0000000000070000 00000000ffffffff 0000000000000000
[ 341.850361][ T5846] page dumped because: kasan: bad access detected
[ 341.856650][ T5846]
[ 341.858850][ T5846] Memory state around the buggy address:
[ 341.864354][ T5846] ffff88812a62ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 341.872284][ T5846] ffff88812a62ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 341.880221][ T5846] >ffff88812a630000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 341.888159][ T5846] ^
[ 341.892100][ T5846] ffff88812a630080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 341.900036][ T5846] ffff88812a630100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 341.907978][ T5846] ==================================================================
[ 341.915979][ T5846] Disabling lock debugging due to kernel taint


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
| Link: https://lore.kernel.org/oe-lkp/202211162205.6cb42ae6-yujie.liu@xxxxxxxxx


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


Thanks for your test!

Looking into above call trace, it seems something wrong happened when read
'string' type data of some kprobe events.

But I currently feel like it hard to relate this issue to my modification
since I change event type allocation in the patch and it can only affect
event registration.

I cannot run the lkp-tests in my environment, sorry for that. I'd like to
know if this problem would always happen after applying my patch? Could you
please provide exact testcase or more detailed logs? Thanks!

This problem is not 100% reproducible in our test. We did 29 rounds of test,
19 of which can reproduce it, while the base commit is always clean.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/group:
lkp-skl-d06/kernel-selftests/debian-12-x86_64-20220629.cgz/x86_64-rhel-8.3-kselftests/gcc-11/ftrace

commit:
0ce0638edf5ec ("ftrace: Properly unset FTRACE_HASH_FL_MOD")
bb10be779a5fc ("tracing: Optimize event type allocation with IDA")

0ce0638edf5ec bb10be779a5fc
---------------- --------------- ------------
fail:runs %reproduction fail:runs
| | |
:20 95% 19:29 dmesg.BUG:KASAN:use-after-free_in_string


Running lkp-tests seems to be a bit complex, sorry about that. We reconfirmed
that the problem can be reproduced by simply run a ftrace selftest case below.
Could you please have a try?

$ cd linux/tools/testing/selftests/ftrace
$ ./ftracetest test.d/kprobe/kprobe_args_string.tc
$ dmesg
...
[ 339.517182][ T5803] ==================================================================
[ 339.525127][ T5803] BUG: KASAN: use-after-free in string+0x29c/0x320
[ 339.531492][ T5803] Read of size 1 at addr ffff888108d7c578 by task grep/5803
[ 339.538645][ T5803]
[ 339.540845][ T5803] CPU: 3 PID: 5803 Comm: grep Not tainted 6.0.0-rc7-00021-gbb10be779a5f #1
[ 339.549295][ T5803] Hardware name: HP HP Z238 Microtower Workstation/8183, BIOS N51 Ver. 01.63 10/05/2017
[ 339.558877][ T5803] Call Trace:
[ 339.562030][ T5803] <TASK>
[ 339.564835][ T5803] dump_stack_lvl+0x45/0x59
[ 339.569210][ T5803] print_address_description+0x1f/0x1e0
[ 339.575663][ T5803] print_report.cold+0x55/0x232
[ 339.580380][ T5803] ? do_raw_spin_lock+0x12e/0x270
[ 339.585275][ T5803] ? string+0x29c/0x320
[ 339.589299][ T5803] kasan_report+0xb1/0x190
[ 339.593582][ T5803] ? kallsyms_lookup_buildid+0xb0/0x2c0
[ 339.598998][ T5803] ? string+0x29c/0x320
[ 339.603026][ T5803] string+0x29c/0x320
...


The full dmesg and test log are attached in the original report, please check
them for more details.

https://lore.kernel.org/oe-lkp/202211162205.6cb42ae6-yujie.liu@xxxxxxxxx

--
Best Regards,
Yujie