Re: Kprobes: pre-handler with interrupts enabled - is it possible?

From: Eugene Shatokhin
Date: Wed Feb 25 2015 - 06:20:18 EST


> (2015/02/24 15:04), Eugene Shatokhin wrote:
24.02.2015 06:47, Masami Hiramatsu ÐÐÑÐÑ:
No, that is not allowed. I mean, you can do anything you want to do
on your handler (enabling preemption/irq etc.) but the result may be
not safe (it can crash your kernel, but it's not a kprobes' bug).

Yes, that is why I am asking.

Actually, enable interrupts on kprobe handlers can cause reentering
kprobes (by kprobes on interrupt handlers), and currently kprobe skips
all those reentered kprobes.
Is it acceptable that some of your kprobe handlers are not fired when
hitting?

I think, yes. When a software breakpoint hits, my system decodes the
instruction, finds the address that is about to be accessed and tries to
place a hardware breakpoint on that memory area.

There are only 4 hardware breakpoints a CPU can use on x86, so if the
software breakpoint hits too often, the system will not be able to
process all hits anyway because all HW breakpoints may be already in use.

Would you mean sleep on your handler??

No, I use mdelay(). It is, in essence, a busy-wait loop as far as I
know. The delay intervals may vary, the default is 5 jiffies.

Hmm, here I couldn't understand. If mdelay() does busy-wait loop, why
would you like to enable irq??
Other code doesn't work on the core while waiting.

I'd like not to enable IRQ but rather to execute my handler with the
same (or similar) restrictions as the original instruction would. If the
insn executed with IRQ enabled, so would the handler, etc. So I am
looking for a way to avoid *additionally* disabling IRQ (and, perhaps,
preemption, although this might be harder).

The breakpoints and delays already incur a penalty on the system's
responsiveness.
However, if, say, I probe an insn executing in a process context with
IRQs enabled, the interrupts may be served on this CPU during the delay.
If, additionally, preemption is not disabled and the kernel is built
with CONFIG_PREEMPT=y then, I guess, mdelay() can be preempted allowing
some other task to run, which is good for overall responsiveness.

Usually, the longer delays I make, the more likely the races are
detected but the performance overhead increases too. I do not have the
exact numbers yet, but still.

So, while 5-10 jiffies are often enough, sometimes it could be
beneficial to wait longer. For example, when I used the system to
confirm a race between .probe() and .ndo_open() callbacks in e1000
driver a year ago, I used the delay of about one second or more (for
NetworkManager to start working with the device), which is too much if
the IRQs were disabled, I think. Both .probe() and .ndo_open() executed
in process context, by the way.

Well, I was actually thinking about something like the following (for
x86, at least).

If a Kprobe's pre_handler returns non-zero, single-step will not be
performed, right? As far as I can see in the code, Jprobes rely on that.
Preemption will still be disabled and Jprobe's handler enables it when
ready.

What if I place a Kprobe on an insn of interest and the pre_handler
changes regs->ip to the address of my function, say, "my_thunk_pre" (see
below) then returns non-zero. Handling of int3 then completes, the
context is restored, the interrupts are re-enabled (if they were enabled
before int3). Preemption remains off because the Kprobe's implementation
disabled it. Execution resumes in "my_thunk_pre" that is written in
assembly and may look like this on x86_64 (x86_32 is similar):

----------------------
my_thunk_pre:
push %rax
<push scratch registers except rax on stack>

call my_handler
// my_handler() is a C function, with the default
// calling convention/linkage.
// Returns the address of the copied insn in the
// Kprobe's insn slot in %rax.

<pop scratch registers except rax from stack>

// restore the orig value of %rax and push the address
// to jump to on the stack
xchg %rax, (%rsp)

// Jump to the copied insn (and fix %rsp at the same time):
ret
----------------------

In this case, my_handler() seems to execute in the same context as the
original insn, except for disabled preemption.

It may use kprobe_running() to get the Kprobe, and, perhaps, some my
structure that contains that Kprobe. Then, I guess, it might call
preempt_enable_no_resched() like Jprobe's handler does (may be some
other actions are needed?). After that, my_handler can do the rest of
its job: arm the HW breakpoints, call mdelay(), etc.

my_handler will return the address of the copied insn in the Kprobe's
insn slot. The control will be passed there by my_thunk_pre().

For this to work, it is needed that the copied insn stored in the
Kprobe's insn slot was followed by a jump back to the original code, to
the next insn, I mean. Of course, this is not necessary for some
control-transfer insns. But my system mostly works with the insns that
access data rather than with these.

Looks like Kprobes already do something similar and place such jumps in
the insn slots (Kprobes with ainsn.boostable == 1) if there is enough
space there. That is, if the size of the copied insn + 5 (size of jmp
near relative) < 16 (MAX_INSN_SIZE). However, this seems to be done
after single-step, which will not happen in my case.

Still, I could place the jumps after the insns in the slots earlier,
e.g., before I arm the Kprobes. Perhaps, it will not interfere with
other functions of Kprobes.

So, if all this worked, I suppose, my system would get everything it
needs: my_handler() will do the delays in the same context and with the
same restrictions as the original insn executes.

Or perhaps, I am missing something critical here? Could this scheme
break Kprobes somehow, what do you think?

If there are no visible culprits, I think, I will give it a try.

So, what is your opinion?

By the way, thanks for you time, this my letter became unusually long.

Regards,
Eugene

--
Eugene Shatokhin, ROSA
www.rosalab.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/