Re: [PATCH v5 3/6] arm64: Kprobes with single stepping support

From: Masami Hiramatsu
Date: Fri Feb 27 2015 - 03:27:34 EST


(2015/02/27 15:38), David Long wrote:
> On 02/18/15 09:59, Masami Hiramatsu wrote:
>> Hi,
>>
>> (2015/02/18 8:11), David Long wrote:
>>> From: Sandeepa Prabhu <sandeepa.prabhu@xxxxxxxxxx>
>>>
>>> Add support for basic kernel probes(kprobes) and jump probes
>>> (jprobes) for ARM64.
>>>
>>> Kprobes will utilize software breakpoint and single step debug
>>> exceptions supported on ARM v8.
>>>
>>> Software breakpoint is placed at the probe address to trap the
>>> kernel execution into kprobe handler.
>>>
>>> ARM v8 supports single stepping to be enabled while exception return
>>> (ERET) with next PC in exception return address (ELR_EL1). The
>>> kprobe handler prepares an executable memory slot for out-of-line
>>> execution with a copy of the original instruction being probed, and
>>> enables single stepping from the instruction slot. With this scheme,
>>> the instruction is executed with the exact same register context
>>> 'except PC' that points to instruction slot.
>>>
>>> Debug mask(PSTATE.D) is enabled only when single stepping a recursive
>>
>> Is that same as "debug flag" in the code commment?
>>
>
> Yes. The description in the commit might benefit from a little rewording.
>
>>> kprobe, e.g.: during kprobes reenter so that probed instruction can be
>>> single stepped within the kprobe handler -exception- context.
>>> The recursion depth of kprobe is always 2, i.e. upon probe re-entry,
>>> any further re-entry is prevented by not calling handlers and the case
>>> counted as a missed kprobe).
>>
>> So, would this mean the debug mask is required for single-stepping
>> in exception context by specification?
>>
>
> Not positive I exactly understand the question but:
>
> 1) We can't single-step at all until we've both set the SS mode bit
> (PSTATE.SS) and unmasked debug exceptions (PSTATE.D). The debug
> exception is normally masked.
> 2) PSTATE.D will be set at exception entry.
> 3) We can't single-step again until we RTE from the current single-step.

Understand, thanks.

>
>>>
>>> Single stepping from slot has a drawback on PC-relative accesses
>>> like branching and symbolic literals access as offset from new PC
>>> (slot address) may not be ensured to fit in immediate value of
>>> opcode. Such instructions needs simulation, so reject
>>> probing such instructions.
>>
>> BTW, since while the single stepping IRQs are disabled, does that
>> also requires fixups for irq-mask loading instruction or something
>> like that? (sorry, I'm not good at aarch64 ISA...)
>
> No, we disable interrupts and reject probing of the special instructions
> that could reenable them *or* read the current status (since the
> single-step environment will likely see a different value for that bit).
> In previous patches I did not reject the reading of this bit.

OK, I see.


>>> Instructions generating exceptions or cpu mode change are rejected,
>>> and not allowed to insert probe for these instructions.
>>>
>>> Instructions using Exclusive Monitor are rejected too.
>>>
>>> System instructions are mostly enabled for stepping, except MSR
>>> immediate that updates "daif" flags in PSTATE, which are not safe
>>> for probing.
>>>
>>> Thanks to Steve Capper and Pratyush Anand for several suggested
>>> Changes.
>>
>> Ok, I have some comments on the code.
>>
[...]
>>> +static void __kprobes set_current_kprobe(struct kprobe *p)
>>> +{
>>> + __this_cpu_write(current_kprobe, p);
>>> +}
>>> +
>>> +/*
>>> + * Debug flag (D-flag) is disabled upon exception entry.
>>> + * Kprobes need to unmask D-flag -ONLY- in case of recursive
>>> + * probe i.e. when probe hit from kprobe handler context upon
>>> + * executing the pre/post handlers. In this case we return with
>>> + * D-flag unmasked so that single-stepping can be carried-out.
>>> + *
>>> + * Keep D-flag masked in all other cases.
>>
>> Here, we'd better choose D-flag or Debug mask.
>>
>
> "Debug Mask" and "Debug exception mask" seems to be what the ARM calls
> it. I can change it to "Debug Mask".

Yeah, that would be better to understand.

>
>>> + */
>>> +static void __kprobes
>>> +spsr_set_debug_flag(struct pt_regs *regs, int mask)
>>> +{
>>> + unsigned long spsr = regs->pstate;
>>> +
>>> + if (mask)
>>> + spsr |= PSR_D_BIT;
>>> + else
>>> + spsr &= ~PSR_D_BIT;
>>> +
>>> + regs->pstate = spsr;
>>> +}
>>> +
>>> +/*
>>> + * Interrupts need to be disabled before single-step mode is set, and not
>>> + * reenabled until after single-step mode ends.
>>> + * Without disabling interrupt on local CPU, there is a chance of
>>> + * interrupt occurrence in the period of exception return and start of
>>> + * out-of-line single-step, that result in wrongly single stepping
>>> + * the interrupt handler.
>>> + */
>>> +static void __kprobes kprobes_save_local_irqflag(struct pt_regs *regs)
>>> +{
>>> + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> + kcb->saved_irqflag = regs->pstate;
>>> + regs->pstate |= PSR_I_BIT;
>>> +}
>>> +
>>> +static void __kprobes kprobes_restore_local_irqflag(struct pt_regs *regs)
>>> +{
>>> + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> + if (kcb->saved_irqflag & PSR_I_BIT)
>>> + regs->pstate |= PSR_I_BIT;
>>> + else
>>> + regs->pstate &= ~PSR_I_BIT;
>>> +}
>>> +
>>> +static void __kprobes
>>> +set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
>>> +{
>>> + kcb->ss_ctx.ss_status = KPROBES_STEP_PENDING;
>>> + kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t);
>>> +}
>>> +
>>> +static void __kprobes clear_ss_context(struct kprobe_ctlblk *kcb)
>>> +{
>>> + kcb->ss_ctx.ss_status = KPROBES_STEP_NONE;
>>> + kcb->ss_ctx.match_addr = 0;
>>> +}
>>> +
>>> +static void __kprobes
>>> +skip_singlestep_missed(struct kprobe_ctlblk *kcb, struct pt_regs *regs)
>>> +{
>>> + /* set return addr to next pc to continue */
>>> + instruction_pointer(regs) += sizeof(kprobe_opcode_t);
>>> +}
>>> +
>>> +static void __kprobes setup_singlestep(struct kprobe *p,
>>> + struct pt_regs *regs,
>>> + struct kprobe_ctlblk *kcb, int reenter)
>>> +{
>>> + unsigned long slot;
>>> +
>>> + if (reenter) {
>>> + save_previous_kprobe(kcb);
>>> + set_current_kprobe(p);
>>> + kcb->kprobe_status = KPROBE_REENTER;
>>> + } else {
>>> + kcb->kprobe_status = KPROBE_HIT_SS;
>>> + }
>>> +
>>> + if (p->ainsn.insn) {
>>> + /* prepare for single stepping */
>>> + slot = (unsigned long)p->ainsn.insn;
>>> +
>>> + set_ss_context(kcb, slot); /* mark pending ss */
>>> +
>>> + if (kcb->kprobe_status == KPROBE_REENTER)
>>> + spsr_set_debug_flag(regs, 0);
>>> +
>>> + /* IRQs and single stepping do not mix well. */
>>> + kprobes_save_local_irqflag(regs);
>>> + kernel_enable_single_step(regs);
>>> + instruction_pointer(regs) = slot;
>>> + } else {
>>> + BUG();
>>> + }
>>> +}
>>> +
>>> +static int __kprobes reenter_kprobe(struct kprobe *p,
>>> + struct pt_regs *regs,
>>> + struct kprobe_ctlblk *kcb)
>>> +{
>>> + switch (kcb->kprobe_status) {
>>> + case KPROBE_HIT_SSDONE:
>>> + case KPROBE_HIT_ACTIVE:
>>> + if (!p->ainsn.check_condn || p->ainsn.check_condn(p, regs)) {
>>> + kprobes_inc_nmissed_count(p);
>>> + setup_singlestep(p, regs, kcb, 1);
>>> + } else {
>>> + /* condition check failed, skip stepping */
>>> + skip_singlestep_missed(kcb, regs);
>>> + }
>>> + break;
>>> + case KPROBE_HIT_SS:
>>
>> Here, KPROBE_REENTER should be needed.
>
> Yes.
>
>> And also, do we really need to fail when KPROBE_HIT_SS case?
>> On x86, I've fixed similar code, since kprobes can hit in single stepping
>> if we put a handler in perf and it can be hit in NMI context, which
>> happens on single-stepping...
>>
>
> Pratyush at Red Hat has been trying to deal with this for uprobes and
> the best information we have from ARM is that you can't single-step
> while processing a single-step exception.

OK, that is a different point from x86.

Actually, on x86, that can happen if we get NMIs frequently on the
kernel like using perf. Recently I knew the ARM has no NMI, but
fast irq which can be masked. If the perf on ARMv8 uses the fast-irq
without masking, we have to mask it until single-stepping is done.

>>> + pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr);
>>> + dump_kprobe(p);
>>> + BUG();
>>> + break;
>>> + default:
>>> + WARN_ON(1);
>>> + return 0;
>>> + }
>>> +
>>> + return 1;
>>> +}
>>> +
>>> +static void __kprobes
>>> +post_kprobe_handler(struct kprobe_ctlblk *kcb, struct pt_regs *regs)
>>> +{
>>> + struct kprobe *cur = kprobe_running();
>>> +
>>> + if (!cur)
>>> + return;
>>> +
>>> + /* return addr restore if non-branching insn */
>>> + if (cur->ainsn.restore.type == RESTORE_PC) {
>>> + instruction_pointer(regs) = cur->ainsn.restore.addr;
>>> + if (!instruction_pointer(regs))
>>> + BUG();
>>> + }
>>> +
>>> + /* restore back original saved kprobe variables and continue */
>>> + if (kcb->kprobe_status == KPROBE_REENTER) {
>>> + restore_previous_kprobe(kcb);
>>> + return;
>>> + }
>>> + /* call post handler */
>>> + kcb->kprobe_status = KPROBE_HIT_SSDONE;
>>> + if (cur->post_handler) {
>>> + /* post_handler can hit breakpoint and single step
>>> + * again, so we enable D-flag for recursive exception.
>>> + */
>>> + cur->post_handler(cur, regs, 0);
>>> + }
>>> +
>>> + reset_current_kprobe();
>>> +}
>>> +
>>> +int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
>>> +{
>>> + struct kprobe *cur = kprobe_running();
>>> + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> + switch (kcb->kprobe_status) {
>>> + case KPROBE_HIT_SS:
>>> + case KPROBE_REENTER:
>>> + /*
>>> + * We are here because the instruction being single
>>> + * stepped caused a page fault. We reset the current
>>> + * kprobe and the ip points back to the probe address
>>> + * and allow the page fault handler to continue as a
>>> + * normal page fault.
>>> + */
>>> + instruction_pointer(regs) = (unsigned long)cur->addr;
>>> + if (!instruction_pointer(regs))
>>> + BUG();
>>> + if (kcb->kprobe_status == KPROBE_REENTER)
>>> + restore_previous_kprobe(kcb);
>>> + else
>>> + reset_current_kprobe();
>>> +
>>> + break;
>>> + case KPROBE_HIT_ACTIVE:
>>> + case KPROBE_HIT_SSDONE:
>>> + /*
>>> + * We increment the nmissed count for accounting,
>>> + * we can also use npre/npostfault count for accounting
>>> + * these specific fault cases.
>>> + */
>>> + kprobes_inc_nmissed_count(cur);
>>> +
>>> + /*
>>> + * We come here because instructions in the pre/post
>>> + * handler caused the page_fault, this could happen
>>> + * if handler tries to access user space by
>>> + * copy_from_user(), get_user() etc. Let the
>>> + * user-specified handler try to fix it first.
>>> + */
>>> + if (cur->fault_handler && cur->fault_handler(cur, regs, fsr))
>>> + return 1;
>>> +
>>> + /*
>>> + * In case the user-specified fault handler returned
>>> + * zero, try to fix up.
>>> + */
>>> + if (fixup_exception(regs))
>>> + return 1;
>>> +
>>> + break;
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> +int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
>>> + unsigned long val, void *data)
>>> +{
>>> + return NOTIFY_DONE;
>>> +}
>>> +
>>> +void __kprobes kprobe_handler(struct pt_regs *regs)
>>> +{
>>> + struct kprobe *p, *cur;
>>> + struct kprobe_ctlblk *kcb;
>>> + unsigned long addr = instruction_pointer(regs);
>>> +
>>> + kcb = get_kprobe_ctlblk();
>>> + cur = kprobe_running();
>>> +
>>> + p = get_kprobe((kprobe_opcode_t *) addr);
>>
>> BTW, I'd like to ensure that this handler runs under normal irqs are disabled.
>> Or, we need preempt_disable() here.
>> # From the same reason, x86 implementation is a bit odd...
>
> The setup_singlestep() call in there will disable irq's before stepping.
> Is there something else in there that worries you?

I meant that preempt_disable()s in x86 kprobe handler. Actually, there is
no preempt_disable() in this handler, so I guess this runs under disabed irq.

>>
>>> +
>>> + if (p) {
>>> + if (cur) {
>>> + if (reenter_kprobe(p, regs, kcb))
>>> + return;
>>> + } else if (!p->ainsn.check_condn ||
>>> + p->ainsn.check_condn(p, regs)) {
>>> + /* Probe hit and conditional execution check ok. */
>>> + set_current_kprobe(p);
>>> + kcb->kprobe_status = KPROBE_HIT_ACTIVE;
>>> +
>>> + /*
>>> + * If we have no pre-handler or it returned 0, we
>>> + * continue with normal processing. If we have a
>>> + * pre-handler and it returned non-zero, it prepped
>>> + * for calling the break_handler below on re-entry,
>>> + * so get out doing nothing more here.
>>> + *
>>> + * pre_handler can hit a breakpoint and can step thru
>>> + * before return, keep PSTATE D-flag enabled until
>>> + * pre_handler return back.
>>> + */
>>> + if (!p->pre_handler || !p->pre_handler(p, regs)) {
>>> + kcb->kprobe_status = KPROBE_HIT_SS;
>>> + setup_singlestep(p, regs, kcb, 0);
>>> + return;
>>> + }
>>> + } else {
>>> + /*
>>> + * Breakpoint hit but conditional check failed,
>>> + * so just skip the instruction (NOP behaviour)
>>> + */
>>> + skip_singlestep_missed(kcb, regs);
>>> + return;
>>> + }
>>> + } else if (*(kprobe_opcode_t *) addr != BRK64_OPCODE_KPROBES) {
>>> + /*
>>> + * The breakpoint instruction was removed right
>>> + * after we hit it. Another cpu has removed
>>> + * either a probepoint or a debugger breakpoint
>>> + * at this address. In either case, no further
>>> + * handling of this interrupt is appropriate.
>>> + * Return back to original instruction, and continue.
>>> + */
>>> + return;
>>> + } else if (cur) {
>>> + /* We probably hit a jprobe. Call its break handler. */
>>> + if (cur->break_handler && cur->break_handler(cur, regs)) {
>>> + kcb->kprobe_status = KPROBE_HIT_SS;
>>> + setup_singlestep(cur, regs, kcb, 0);
>>> + return;
>>> + }
>>> + } else {
>>> + /* breakpoint is removed, now in a race
>>> + * Return back to original instruction & continue.
>>> + */
>>> + }
>>> +}
>>> +
>>> +static int __kprobes
>>> +kprobe_ss_hit(struct kprobe_ctlblk *kcb, unsigned long addr)
>>> +{
>>> + if ((kcb->ss_ctx.ss_status == KPROBES_STEP_PENDING)
>>> + && (kcb->ss_ctx.match_addr == addr)) {
>>> + clear_ss_context(kcb); /* clear pending ss */
>>> + return DBG_HOOK_HANDLED;
>>> + }
>>> + /* not ours, kprobes should ignore it */
>>> + return DBG_HOOK_ERROR;
>>> +}
>>> +
>>> +static int __kprobes
>>> +kprobe_single_step_handler(struct pt_regs *regs, unsigned int esr)
>>> +{
>>> + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> + int retval;
>>> +
>>> + /* return error if this is not our step */
>>> + retval = kprobe_ss_hit(kcb, instruction_pointer(regs));
>>> +
>>> + if (retval == DBG_HOOK_HANDLED) {
>>> + kprobes_restore_local_irqflag(regs);
>>> + kernel_disable_single_step();
>>> +
>>> + if (kcb->kprobe_status == KPROBE_REENTER)
>>> + spsr_set_debug_flag(regs, 1);
>>> +
>>> + post_kprobe_handler(kcb, regs);
>>> + }
>>> +
>>> + return retval;
>>> +}
>>> +
>>> +static int __kprobes
>>> +kprobe_breakpoint_handler(struct pt_regs *regs, unsigned int esr)
>>> +{
>>> + kprobe_handler(regs);
>>> + return DBG_HOOK_HANDLED;
>>> +}
>>> +
>>> +int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
>>> +{
>>> + struct jprobe *jp = container_of(p, struct jprobe, kp);
>>> + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> + long stack_ptr = stack_pointer(regs);
>>> +
>>> + kcb->jprobe_saved_regs = *regs;
>>> + memcpy(kcb->jprobes_stack, (void *)stack_ptr,
>>> + MIN_STACK_SIZE(stack_ptr));
>>> +
>>> + instruction_pointer(regs) = (long)jp->entry;
>>> + preempt_disable();
>>> + return 1;
>>> +}
>>> +
>>> +void __kprobes jprobe_return(void)
>>> +{
>>> + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> + /*
>>> + * Jprobe handler return by entering break exception,
>>> + * encoded same as kprobe, but with following conditions
>>> + * -a magic number in x0 to identify from rest of other kprobes.
>>> + * -restore stack addr to original saved pt_regs
>>> + */
>>> + asm volatile ("ldr x0, [%0]\n\t"
>>> + "mov sp, x0\n\t"
>>> + "ldr x0, =" __stringify(JPROBES_MAGIC_NUM) "\n\t"
>>> + "BRK %1\n\t"
>>> + "NOP\n\t"
>>> + :
>>> + : "r"(&kcb->jprobe_saved_regs.sp),
>>> + "I"(BRK64_ESR_KPROBES)
>>> + : "memory");
>>> +}
>>> +
>>> +int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
>>> +{
>>> + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> + long stack_addr = kcb->jprobe_saved_regs.sp;
>>> + long orig_sp = stack_pointer(regs);
>>> + struct jprobe *jp = container_of(p, struct jprobe, kp);
>>> +
>>> + if (regs->regs[0] == JPROBES_MAGIC_NUM) {
>>> + if (orig_sp != stack_addr) {
>>> + struct pt_regs *saved_regs =
>>> + (struct pt_regs *)kcb->jprobe_saved_regs.sp;
>>> + pr_err("current sp %lx does not match saved sp %lx\n",
>>> + orig_sp, stack_addr);
>>> + pr_err("Saved registers for jprobe %p\n", jp);
>>> + show_regs(saved_regs);
>>> + pr_err("Current registers\n");
>>> + show_regs(regs);
>>> + BUG();
>>> + }
>>> + *regs = kcb->jprobe_saved_regs;
>>> + memcpy((void *)stack_addr, kcb->jprobes_stack,
>>> + MIN_STACK_SIZE(stack_addr));
>>> + preempt_enable_no_resched();
>>> + return 1;
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> +/* Break Handler hook */
>>> +static struct break_hook kprobes_break_hook = {
>>> + .esr_mask = BRK64_ESR_MASK,
>>> + .esr_val = BRK64_ESR_KPROBES,
>>> + .fn = kprobe_breakpoint_handler,
>>> +};
>>> +
>>> +/* Single Step handler hook */
>>> +static struct step_hook kprobes_step_hook = {
>>> + .fn = kprobe_single_step_handler,
>>> +};
>>> +
>>> +int __init arch_init_kprobes(void)
>>> +{
>>> + register_break_hook(&kprobes_break_hook);
>>> + register_step_hook(&kprobes_step_hook);
>>
>> I really recommend you to not use this "general hook caller" for
>> kprobes, since it is called from anywhere, especially, if you
>> enable lockdep, read_lock can involve so many functions. This
>> means you can fall into the hell of the infinite loop of the
>> breakpoint exceptions...
>>
>
> OK. I hope no one will object about any code duplication that results.

Anyway, on x86, we already have kgdb, ftrace and kprobes which are directly
invoked from do_int3...

>> Moreover, no one remove these hook handlers from the list.
>>
>
> Since this is called from the architecture-independent kprobes code, and
> since that code has no module_exit() code, I don't see a clean way to
> get a call into a new arch_exit_kprobes() which could unregister these
> hooks. kprobes in general does not appear to be unloadable.

Right.

Thank you,


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@xxxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/