Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

From: Andy Lutomirski
Date: Mon Oct 08 2018 - 04:33:20 EST




> On Oct 8, 2018, at 12:21 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> On Sat, Oct 06, 2018 at 09:39:05AM -0400, Steven Rostedt wrote:
>> On Sat, 6 Oct 2018 14:12:11 +0200
>> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>
>>>> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
>>>> +#define arch_dynfunc_trampoline(name, def) \
>>>> + asm volatile ( \
>>>> + ".globl dynfunc_" #name "; \n\t" \
>>>> + "dynfunc_" #name ": \n\t" \
>>>> + "jmp " #def " \n\t" \
>>>> + ".balign 8 \n \t" \
>>>> + : : : "memory" )
>>>
>>> Bah, what is it with you people and trampolines. Why can't we, just like
>>> jump_label, patch the call directly?
>>>
>>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>>> is slower for no real reason afaict.
>>
>> My first attempt was to do just that. But to add a label at the
>> call site required handling all the parameters too. See my branch:
>> ftrace/jump_function-v1 for how ugly it got (and it didn't work).
>
> Can't we hijack the relocation records for these functions before they
> get thrown out in the (final) link pass or something?

I could be talking out my arse here, but I thought we could do this, too, then changed my mind. The relocation records give us the location of the call or jump operand, but they donât give the address of the beginning of the instruction. If the instruction crosses a cache line boundary, donât we need to use the int3 patching trick? And that requires knowing which byte to patch with int3.

Or am I wrong and can the CPUs we care about correctly handle a locked write to part of an instruction that crosses a cache line boundary?