Re: [PATCH 2/3 v3] x86: entry_64.S: always allocate complete "struct pt_regs"

From: Denys Vlasenko
Date: Wed Feb 25 2015 - 13:43:21 EST


This is a multi-part message in MIME format.On 02/25/2015 01:37 PM, Andrey Wagin wrote:
> 2015-02-13 0:54 GMT+03:00 Denys Vlasenko <dvlasenk@xxxxxxxxxx>:
>> 64-bit code was using six stack slots less by not saving/restoring
>> registers which are callee-preserved according to C ABI,
>> and not allocating space for them.
>> Only when syscall needed a complete "struct pt_regs",
>> the complete area was allocated and filled in.
>> As an additional twist, on interrupt entry a "slightly less truncated pt_regs"
>> trick is used, to make nested interrupt stacks easier to unwind.
>>
>> This proved to be a source of significant obfuscation and subtle bugs.
>> For example, stub_fork had to pop the return address,
>> extend the struct, save registers, and push return address back. Ugly.
>> ia32_ptregs_common pops return address and "returns" via jmp insn,
>> throwing a wrench into CPU return stack cache.
>>
>> This patch changes code to always allocate a complete "struct pt_regs".
>> The saving of registers is still done lazily.
>>
>> "Partial pt_regs" trick on interrupt stack is retained.
>>
>> Macros which manipulate "struct pt_regs" on stack are reworked:
>> ALLOC_PT_GPREGS_ON_STACK allocates the structure.
>> SAVE_C_REGS saves to it those registers which are clobbered by C code.
>> SAVE_EXTRA_REGS saves to it all other registers.
>> Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it.
>>
>> ia32_ptregs_common, stub_fork and friends lost their ugly dance with
>> return pointer.
>>
>> LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
>> instead of magic numbers.
>>
>> error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS
>> instead of having it open-coded yet again.
>>
>> Patch was run-tested: 64-bit executables, 32-bit executables,
>> strace works.
>> Timing tests did not show measurable difference in 32-bit
>> and 64-bit syscalls.
>
> Hello Denys,
>
> My test vm doesn't boot with this patch. Could you help to investigate
> this issue?

I think I found it. This part of my patch is possibly wrong:

@@ -171,9 +171,9 @@ static inline int arch_irqs_disabled(void)
#define ARCH_LOCKDEP_SYS_EXIT_IRQ \
TRACE_IRQS_ON; \
sti; \
- SAVE_REST; \
+ SAVE_EXTRA_REGS; \
LOCKDEP_SYS_EXIT; \
- RESTORE_REST; \
+ RESTORE_EXTRA_REGS; \
cli; \
TRACE_IRQS_OFF;

The "SAVE_REST" here is intended to really *push* extra regs on stack,
but the patch changed it so that they are written to existing stack
slots above.

>From code inspection it should work in almost all cases, but some
locations where it is used are really obscure.

If there are places where *pushing* regs is really necessary,
this can corrupt rbp,rbx,r12-15 registers.

Your config has CONFIG_LOCKDEP=y, I think it's worth trying whether the bug
was here.
Please find updated patch attached. Can you try it?

--
vda