Re: [PATCH 5/6] sched/preempt: add PREEMPT_DYNAMIC using static keys

From: Mark Rutland
Date: Wed Feb 02 2022 - 10:29:40 EST


On Mon, Dec 13, 2021 at 11:05:01PM +0100, Frederic Weisbecker wrote:
> On Tue, Nov 09, 2021 at 05:24:07PM +0000, Mark Rutland wrote:
> > Where an architecture selects HAVE_STATIC_CALL but not
> > HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
> > which will either branch to a callee or return to the caller.
> >
> > On such architectures, a number of constraints can conspire to make
> > those trampolines more complicated and potentially less useful than we'd
> > like. For example:
> >
> > * Hardware and software control flow integrity schemes can require the
> > additition of "landing pad" instructions (e.g. `BTI` for arm64), which
> > will also be present at the "real" callee.
> >
> > * Limited branch ranges can require that trampolines generate or load an
> > address into a registter and perform an indirect brach (or at least
> > have a slow path that does so). This loses some of the benefits of
> > having a direct branch.
> >
> > * Interaction with SW CFI schemes can be complicated and fragile, e.g.
> > requiring that we can recognise idiomatic codegen and remove
> > indirections understand, at least until clang proves more helpful
> > mechanisms for dealing with this.
> >
> > For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
> > really only need to enable/disable specific preemption functions. We can
> > achieve the same effect without a number of the pain points above by
> > using static keys to fold early return cases into the preemption
> > functions themselves rather than in an out-of-line trampoline,
> > effectively inlining the trampoline into the start of the function.
> >
> > For arm64, this results in good code generation, e.g. the
> > dynamic_cond_resched() wrapper looks as follows (with the first `B` being
> > replaced with a `NOP` when the function is disabled):
> >
> > | <dynamic_cond_resched>:
> > | bti c
> > | b <dynamic_cond_resched+0x10>
> > | mov w0, #0x0 // #0
> > | ret
> > | mrs x0, sp_el0
> > | ldr x0, [x0, #8]
> > | cbnz x0, <dynamic_cond_resched+0x8>
> > | paciasp
> > | stp x29, x30, [sp, #-16]!
> > | mov x29, sp
> > | bl <preempt_schedule_common>
> > | mov w0, #0x1 // #1
> > | ldp x29, x30, [sp], #16
> > | autiasp
> > | ret
> >
> > ... compared to the regular form of the function:
> >
> > | <__cond_resched>:
> > | bti c
> > | mrs x0, sp_el0
> > | ldr x1, [x0, #8]
> > | cbz x1, <__cond_resched+0x18>
> > | mov w0, #0x0 // #0
> > | ret
> > | paciasp
> > | stp x29, x30, [sp, #-16]!
> > | mov x29, sp
> > | bl <preempt_schedule_common>
> > | mov w0, #0x1 // #1
> > | ldp x29, x30, [sp], #16
> > | autiasp
> > | ret
> >
> > Any architecture which implements static keys should be able to use this
> > to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
> > calls.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> > Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>
> Anyone has an opinion on that? Can we do better on the arm64 static call side
> or should we resign ourself to using that static keys direction?

>From speaking with other arm64 folk, I think we're agreed that this is
preferable to implementing static calls (especially givne the pain points with
interaction with CFI).

I don't think it's fair to say we're "resigning outselves" to using static keys
-- this is vastly simpler to implement and maintain the static call approach,
should perform no worse than the form of static call trampolines that we'd have
to implement for static calls, and makes it easier for architectures to enable
PREEMPT_DYNAMIC, so it seems like an all-round win.

> Also I assume that, sooner or later, arm64 will eventually need a static call
> implementation....

I really hope not, becuase the current design of static calls (with arbitrary
targets) is not a great fit for arm64.

The only other major use for static keys on the arm64 side is for tracing
hooks, and that's *purely* to avoid the overhead that the current clang CFI
scheme imposes for modules. For that I'd rather fix the CFI scheme, because
that also interacts poorly with static calls to begin with...

Thanks,
Mark.