Re: [PATCH v5 0/9] Add OPTPROBES feature on RISCV

From: Xim
Date: Wed Jan 04 2023 - 03:39:24 EST


Hi Björn,

Thanks for your detailed review! I made tests mainly on some syscall/timer related
functions where these issues were not triggered. I will check all these issues as well
as comments that spread per-patch before a new version of patch set is sent.

FYI the 32b support is included and was tested with mostly same cases as the 64b one.

Regards,
Guokai Chen

> 2023年1月3日 02:02,Björn Töpel <bjorn@xxxxxxxxxx> 写道:
>
> Chen Guokai <chenguokai17@xxxxxxxxxxxxxxxx> writes:
>
>> Add jump optimization support for RISC-V.
>
> Thank you for continuing to work on the series! I took the series for a
> spin, and ran into a number of issues that makes me wonder how you test
> the series, and how the testing is different from my runs.
>
> I'll outline the general/big issues here, and leave the specifics per-patch.
>
> I've done simple testing, using "Kprobe-based Event Tracing"
> (CONFIG_KPROBE_EVENTS=y) via tracefs.
>
> All the tests were run on commit 88603b6dc419 ("Linux 6.2-rc2") with the
> series applied. All the bugs were trigged by setting different probes to
> do_sys_openat2. Code:
>
> do_sys_openat2:
> ...snip...
> ffffffff802d138c: 89aa c.mv s3,a0 // +44
> ffffffff802d138e: 892e c.mv s2,a1 // +46
> ffffffff802d1390: 8532 c.mv a0,a2
> ffffffff802d1392: fa040593 addi a1,s0,-96
> ffffffff802d1396: 84b2 c.mv s1,a2
> ffffffff802d1398: fa043023 sd zero,-96(s0)
> ffffffff802d139c: fa043423 sd zero,-88(s0)
> ffffffff802d13a0: fa042823 sw zero,-80(s0)
> ffffffff802d13a4: 00000097 auipc ra,0x0
> ...snip...
>
>
> 1. Fail to register kprobe to c.mv
>
> Add a kprobe:
> echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
>
> register_kprobe returns -22 (EINVAL). This is due to a bug in the
> instruction decoder. I've sent to fix upstream [1].
>
> 2. (with [1] applied) Oops when register a probe
>
> Add a kprobe:
> echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
>
> You get a splat:
> Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
> Oops [#1]
> Modules linked in:
> CPU: 1 PID: 242 Comm: bash Tainted: G W 6.2.0-rc2-00010-g09ff1aa7b1f9-dirty #14
> Hardware name: riscv-virtio,qemu (DT)
> epc : riscv_probe_decode_insn+0x16a/0x192
> ra : riscv_probe_decode_insn+0x32/0x192
> epc : ffffffff8127b2bc ra : ffffffff8127b184 sp : ff2000000173bac0
> gp : ffffffff82533f70 tp : ff60000086ab2b40 t0 : 0000000000000000
> t1 : 0000000000000850 t2 : 65646f6365642054 s0 : ff2000000173bae0
> s1 : 0000000000000017 a0 : 000000000000e001 a1 : 000000000000003f
> a2 : 0000000000009002 a3 : 0000000000000017 a4 : 000000000000c001
> a5 : ffffffff8127b38a a6 : ff6000047d666000 a7 : 0000000000040000
> s2 : 0000000000000000 s3 : 0000000000000006 s4 : ff6000008558f718
> s5 : ff6000008558f718 s6 : 0000000000000001 s7 : ff6000008558f768
> s8 : 0000000000000007 s9 : 0000000000000003 s10: 0000000000000002
> s11: 00aaaaaad62baf78 t3 : 0000000000000000 t4 : 8dd70b0100000000
> t5 : ffffffffffffe000 t6 : ff2000000173b8c8
> status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000f
> [<ffffffff81257e48>] arch_prepare_optimized_kprobe+0xc2/0x4ec
> [<ffffffff8125b420>] alloc_aggr_kprobe+0x5c/0x6a
> [<ffffffff8125ba0a>] register_kprobe+0x5dc/0x6a2
> [<ffffffff8016f266>] __register_trace_kprobe.part.0+0x98/0xbc
> [<ffffffff80170544>] __trace_kprobe_create+0x6ea/0xbcc
> [<ffffffff80176cee>] trace_probe_create+0x6c/0x7c
> [<ffffffff8016f1a2>] create_or_delete_trace_kprobe+0x24/0x50
> [<ffffffff80150642>] trace_parse_run_command+0x9e/0x12a
> [<ffffffff8016f176>] probes_write+0x18/0x20
> [<ffffffff802d494a>] vfs_write+0xca/0x41e
> [<ffffffff802d4f96>] ksys_write+0x70/0xee
> [<ffffffff802d5036>] sys_write+0x22/0x2a
> [<ffffffff80004196>] ret_from_syscall+0x0/0x2
>
> This is because a call to riscv_probe_decode_insn(probe_opcode_t *addr,
> struct arch_probe_insn *api), where api is NULL (and tripping over
> auipc). Should be a common scenario...
>
> 3. No bound check for instructions
>
> Add a probe to a non-valid instruction (in the middle of addi):
> echo 'p 0xffffffff802d1394' > /sys/kernel/debug/tracing/kprobe_events
>
> You get the same splat as above from the auipc NULL-pointer, but the
> "half" addi-instruction is parsed as a correct instruction.
>
> 4. Lockdep splat
>
> Might be false positive; When enabling a probe, e.g.
> echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
>
>
> ======================================================
> WARNING: possible circular locking dependency detected
>
> ------------------------------------------------------
> bash/244 is trying to acquire lock:
> ffffffff8223ee90 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x2c/0x54
>
> but task is already holding lock:
> ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (text_mutex){+.+.}-{3:3}:
> lock_acquire+0x10a/0x328
> __mutex_lock+0xa8/0x770
> mutex_lock_nested+0x28/0x30
> register_kprobe+0x3ae/0x5ea
> __register_trace_kprobe.part.0+0x98/0xbc
> __trace_kprobe_create+0x6ea/0xbcc
> trace_probe_create+0x6c/0x7c
> create_or_delete_trace_kprobe+0x24/0x50
> trace_parse_run_command+0x9e/0x12a
> probes_write+0x18/0x20
> vfs_write+0xca/0x41e
> ksys_write+0x70/0xee
> sys_write+0x22/0x2a
> ret_from_syscall+0x0/0x2
>
> -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> check_noncircular+0x122/0x13a
> __lock_acquire+0x1058/0x20e4
> lock_acquire+0x10a/0x328
> cpus_read_lock+0x4c/0x11c
> stop_machine+0x2c/0x54
> arch_ftrace_update_code+0x2e/0x4c
> ftrace_startup+0xd0/0x15e
> register_ftrace_function+0x32/0x7c
> arm_kprobe+0x132/0x198
> enable_kprobe+0x9c/0xc0
> enable_trace_kprobe+0x6e/0xea
> kprobe_register+0x64/0x6c
> __ftrace_event_enable_disable+0x72/0x246
> event_enable_write+0x94/0xe4
> vfs_write+0xca/0x41e
> ksys_write+0x70/0xee
> sys_write+0x22/0x2a
> ret_from_syscall+0x0/0x2
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(text_mutex);
> lock(cpu_hotplug_lock);
> lock(text_mutex);
> lock(cpu_hotplug_lock);
>
> *** DEADLOCK ***
>
> 5 locks held by bash/244:
> #0: ff60000080f49438 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x70/0xee
> #1: ffffffff822d9468 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0x7c/0xe4
> #2: ffffffff822d3fa8 (kprobe_mutex){+.+.}-{3:3}, at: enable_kprobe+0x32/0xc0
> #3: ffffffff822d56d8 (ftrace_lock){+.+.}-{3:3}, at: register_ftrace_function+0x26/0x7c
> #4: ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>
> stack backtrace:
> CPU: 2 PID: 244 Comm: bash Not tainted 6.2.0-rc1-00008-g544b2c59fd81 #1
> Hardware name: riscv-virtio,qemu (DT)
> Call Trace:
> [<ffffffff80006e80>] dump_backtrace+0x30/0x38
> [<ffffffff81256e82>] show_stack+0x40/0x4c
> [<ffffffff8126e054>] dump_stack_lvl+0x62/0x84
> [<ffffffff8126e08e>] dump_stack+0x18/0x20
> [<ffffffff8009b37e>] print_circular_bug+0x2ac/0x318
> [<ffffffff8009b50c>] check_noncircular+0x122/0x13a
> [<ffffffff8009e020>] __lock_acquire+0x1058/0x20e4
> [<ffffffff8009f90c>] lock_acquire+0x10a/0x328
> [<ffffffff8002fb8a>] cpus_read_lock+0x4c/0x11c
> [<ffffffff8011ed60>] stop_machine+0x2c/0x54
> [<ffffffff8013aec6>] arch_ftrace_update_code+0x2e/0x4c
> [<ffffffff8013e796>] ftrace_startup+0xd0/0x15e
> [<ffffffff8013e856>] register_ftrace_function+0x32/0x7c
> [<ffffffff8012f928>] arm_kprobe+0x132/0x198
> [<ffffffff8012fa2a>] enable_kprobe+0x9c/0xc0
> [<ffffffff8016ff62>] enable_trace_kprobe+0x6e/0xea
> [<ffffffff801700da>] kprobe_register+0x64/0x6c
> [<ffffffff8015eba6>] __ftrace_event_enable_disable+0x72/0x246
> [<ffffffff8015eeea>] event_enable_write+0x94/0xe4
> [<ffffffff802d5e1a>] vfs_write+0xca/0x41e
> [<ffffffff802d6466>] ksys_write+0x70/0xee
> [<ffffffff802d6506>] sys_write+0x22/0x2a
> [<ffffffff80004196>] ret_from_syscall+0x0/0x2
>
>
> 5. 32b support?
>
> I've noticed that there code supports rv32. Is this tested? Does regular
> kprobes work on 32b?
>
>
> Thanks,
> Björn
>
>
> [1] https://lore.kernel.org/linux-riscv/20230102160748.1307289-1-bjorn@xxxxxxxxxx/
>