Re: [PATCH v15 04/10] arm64: Kprobes with single stepping support

From: David Long
Date: Thu Aug 04 2016 - 00:55:31 EST


On 07/29/2016 05:01 AM, Daniel Thompson wrote:
> On 28/07/16 15:40, Catalin Marinas wrote:
>> On Wed, Jul 27, 2016 at 06:13:37PM -0400, David Long wrote:
>>> On 07/27/2016 07:50 AM, Daniel Thompson wrote:
>>>> On 25/07/16 23:27, David Long wrote:
>>>>> On 07/25/2016 01:13 PM, Catalin Marinas wrote:
>>>>>> The problem is that the original design was done on x86 for its
>>>>>> PCS and
>>>>>> it doesn't always fit other architectures. So we could either
>>>>>> ignore the
>>>>>> problem, hoping that no probed function requires argument passing on
>>>>>> stack or we copy all the valid data on the kernel stack:
>>>>>>
>>>>>> diff --git a/arch/arm64/include/asm/kprobes.h
>>>>>> b/arch/arm64/include/asm/kprobes.h
>>>>>> index 61b49150dfa3..157fd0d0aa08 100644
>>>>>> --- a/arch/arm64/include/asm/kprobes.h
>>>>>> +++ b/arch/arm64/include/asm/kprobes.h
>>>>>> @@ -22,7 +22,7 @@
>>>>>>
>>>>>> #define __ARCH_WANT_KPROBES_INSN_SLOT
>>>>>> #define MAX_INSN_SIZE 1
>>>>>> -#define MAX_STACK_SIZE 128
>>>>>> +#define MAX_STACK_SIZE THREAD_SIZE
>>>>>>
>>>>>> #define flush_insn_slot(p) do { } while (0)
>>>>>> #define kretprobe_blacklist_size 0
>>>>>
>>>>> I doubt the ARM PCS is unusual. At any rate I'm certain there are
>>>>> other
>>>>> architectures that pass aggregate parameters on the stack. I suspect
>>>>> other RISC(-ish) architectures have similar PCS issues and I think
>>>>> this
>>>>> is at least a big part of where this simple copy with a 64/128 limit
>>>>> comes from, or at least why it continues to exist. That said, I'm not
>>>>> enthusiastic about researching that assertion in detail as it could be
>>>>> time consuming.
>>>>
>>>> Given Mark shared a test program I *was* curious enough to take a look
>>>> at this.
>>>>
>>>> The only architecture I can find that behaves like arm64 with the
>>>> implicit pass-by-reference described by Catalin/Mark is sparc64.
>>>>
>>>> In contrast alpha, arm (32-bit), hppa64, mips64 and powerpc64 all use a
>>>> hybrid approach where the first fragments of the structure are
>>>> passed in
>>>> registers and the remainder on the stack.
>>>
>>> That's interesting. It also looks like sparc64 does not copy any
>>> stack for
>>> jprobes. I guess that approach at least makes it clear what will and
>>> won't
>>> work.
>>
>> I suggest we do the same for arm64 - avoid the copying entirely as it's
>> not safe anyway. We don't know how much to copy, nor can we be sure it
>> is safe (see Dave's DMA to the stack example). This would need to be
>> documented in the kprobes.txt file and MAX_STACK_SIZE removed from the
>> arm64 kprobes support.
>>
>> There is also the case that Daniel was talking about - passing more than
>> 8 arguments. I don't think it's worth handling this
>
> Its actually quite hard to document the (architecture specific) "no big
> structures" *and* the "8 argument" limits. It ends up as something like:
>
> Structures/unions >16 bytes must not be passed by value and the
> size of all arguments, after padding each to an 8 byte boundary, must
> be less than 64 bytes.
>
> We cannot avoid tackling big structures through documentation but when
> we impose additional limits like "only 8 arguments" we are swapping an
> architecture neutral "gotcha" that affects almost all jprobes uses (and
> can be inferred from the documentation) with an architecture specific one!
>

See new patch below. The documentation change in it could use some scrutiny.
I've tested with one-off jprobes functions in a test module and I've
verified NET_TCPPROBE doesn't cause misbehavior.

>
> > but we should at
>> least add a warning and skip the probe:
>>
>> diff --git a/arch/arm64/kernel/probes/kprobes.c
>> b/arch/arm64/kernel/probes/kprobes.c
>> index bf9768588288..84e02606ec3d 100644
>> --- a/arch/arm64/kernel/probes/kprobes.c
>> +++ b/arch/arm64/kernel/probes/kprobes.c
>> @@ -491,6 +491,10 @@ int __kprobes setjmp_pre_handler(struct kprobe
>> *p, struct pt_regs *regs)
>> struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>> long stack_ptr = kernel_stack_pointer(regs);
>>
>> + /* do not allow arguments passed on the stack */
>> + if (WARN_ON_ONCE(regs->sp != regs->regs[29]))
>> + return 0;
>> +
>
> I don't really understand this test.
>
> If we could reliably assume that the frame record was at the lowest
> address within a stack frame then we could exploit that to store the
> stacked arguments without risking overwriting volatile variables on the
> stack.
>
>
> Daniel.
>

I'm assuming the consensus is to not use the above snippet of code.

Thanks,
-dl

----------cut here--------