[patch 00/12] x86/irq/64: Inline irq stack switching

From: Thomas Gleixner
Date: Thu Feb 04 2021 - 19:58:56 EST


The recent effort to make the ASM entry code slim and unified moved
the irq stack switching out of the low level ASM code so that the
whole return from interrupt work and state handling can be done in C
and the ASM code just handles the true low level details of entry and
exit (which is horrible enough already due to the well thought out
architeture).

The main goal at this point was to get instrumentation and RCU state
under control in a validated way. Inlining the switch mechanism was
attempted back then, but that caused more objtool and unwinder trouble
than we had already on our plate, so we ended up with a simple,
functional but suboptimal implementation. The main issues are:

- The unnecessary indirect call which is expensive thanks to
retpoline

- The inability to stay on the irq stack for softirq processing on return
from interrupt which requires another stack switch operation.

- The fact that the stack switching code ended up being an easy to find
exploit gadget.

This series revisits the problem and reimplements the stack switch
mechanics via evil inline assembly. Peter Zijlstra provided the required
objtool and unwinder changes already. These are available here:

https://lore.kernel.org/r/20210203120222.451068583@xxxxxxxxxxxxx

The full series (including Peter's series) is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/entry

All function calls are now direct and fully inlined including the single
instance in the softirq code which is invoked from local_bh_enable() in
task context.

The extra 100 lines in the diffstat are pretty much the extensive commentry
for the whole magic to spare everyone including myself to scratch heads 2
weeks down the road.

The text size impact is in the noise and looking at the actual entry
functions there is depending on the compiler variant even a small size
decrease.

The patches have been tested with gcc8, gcc10 and clang-13 (fresh from
git). The difference between the output of these compilers is minimal.
gcc8 being slightly worse due to stupid register selection and random
NOPs injected.

Thanks,

tglx
---
arch/x86/Kconfig | 1
arch/x86/entry/common.c | 19 --
arch/x86/entry/entry_64.S | 41 -----
arch/x86/include/asm/idtentry.h | 11 -
arch/x86/include/asm/irq.h | 3
arch/x86/include/asm/irq_stack.h | 283 +++++++++++++++++++++++++++------------
arch/x86/include/asm/processor.h | 9 -
arch/x86/kernel/apic/apic.c | 31 ++--
arch/x86/kernel/cpu/common.c | 4
arch/x86/kernel/dumpstack_64.c | 22 ++-
arch/x86/kernel/irq.c | 2
arch/x86/kernel/irq_64.c | 11 -
arch/x86/kernel/process_64.c | 2
include/linux/interrupt.h | 2
kernel/softirq.c | 4
15 files changed, 270 insertions(+), 175 deletions(-)