Re: [PATCH 2/3 v3] x86: entry_64.S: always allocate complete "struct pt_regs"

From: Andrey Wagin
Date: Wed Feb 25 2015 - 14:59:31 EST


2015-02-25 21:42 GMT+03:00 Denys Vlasenko <dvlasenk@xxxxxxxxxx>:
> On 02/25/2015 01:37 PM, Andrey Wagin wrote:
>> 2015-02-13 0:54 GMT+03:00 Denys Vlasenko <dvlasenk@xxxxxxxxxx>:
>>> 64-bit code was using six stack slots less by not saving/restoring
>>> registers which are callee-preserved according to C ABI,
>>> and not allocating space for them.
>>> Only when syscall needed a complete "struct pt_regs",
>>> the complete area was allocated and filled in.
>>> As an additional twist, on interrupt entry a "slightly less truncated pt_regs"
>>> trick is used, to make nested interrupt stacks easier to unwind.
>>>
>>> This proved to be a source of significant obfuscation and subtle bugs.
>>> For example, stub_fork had to pop the return address,
>>> extend the struct, save registers, and push return address back. Ugly.
>>> ia32_ptregs_common pops return address and "returns" via jmp insn,
>>> throwing a wrench into CPU return stack cache.
>>>
>>> This patch changes code to always allocate a complete "struct pt_regs".
>>> The saving of registers is still done lazily.
>>>
>>> "Partial pt_regs" trick on interrupt stack is retained.
>>>
>>> Macros which manipulate "struct pt_regs" on stack are reworked:
>>> ALLOC_PT_GPREGS_ON_STACK allocates the structure.
>>> SAVE_C_REGS saves to it those registers which are clobbered by C code.
>>> SAVE_EXTRA_REGS saves to it all other registers.
>>> Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it.
>>>
>>> ia32_ptregs_common, stub_fork and friends lost their ugly dance with
>>> return pointer.
>>>
>>> LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
>>> instead of magic numbers.
>>>
>>> error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS
>>> instead of having it open-coded yet again.
>>>
>>> Patch was run-tested: 64-bit executables, 32-bit executables,
>>> strace works.
>>> Timing tests did not show measurable difference in 32-bit
>>> and 64-bit syscalls.
>>
>> Hello Denys,
>>
>> My test vm doesn't boot with this patch. Could you help to investigate
>> this issue?
>
> I think I found it. This part of my patch is possibly wrong:
>
> @@ -171,9 +171,9 @@ static inline int arch_irqs_disabled(void)
> #define ARCH_LOCKDEP_SYS_EXIT_IRQ \
> TRACE_IRQS_ON; \
> sti; \
> - SAVE_REST; \
> + SAVE_EXTRA_REGS; \
> LOCKDEP_SYS_EXIT; \
> - RESTORE_REST; \
> + RESTORE_EXTRA_REGS; \
> cli; \
> TRACE_IRQS_OFF;
>
> The "SAVE_REST" here is intended to really *push* extra regs on stack,
> but the patch changed it so that they are written to existing stack
> slots above.
>
> From code inspection it should work in almost all cases, but some
> locations where it is used are really obscure.
>
> If there are places where *pushing* regs is really necessary,
> this can corrupt rbp,rbx,r12-15 registers.
>
> Your config has CONFIG_LOCKDEP=y, I think it's worth trying whether the bug
> was here.
> Please find updated patch attached. Can you try it?

It doesn't work

[ 2.282198] microcode: CPU0 sig=0x623, pf=0x0, revision=0x1
[ 2.288321] microcode: CPU1 sig=0x623, pf=0x0, revision=0x1
[ 2.289139] microcode: CPU2 sig=0x623, pf=0x0, revision=0x1
[ 2.290618] microcode: CPU3 sig=0x623, pf=0x0, revision=0x1
[ 2.292417] microcode: Microcode Update Driver: v2.00
<tigran@xxxxxxxxxxxxxxxxxxxx>, Peter Oruba
[ 2.392592] piix4_smbus 0000:00:01.3: SMBus Host Controller at
0xb100, revision 0
[ 2.510882] systemd-fsck[349]: /dev/sda1: clean, 343/128016 files,
166911/512000 blocks
[ 2.521463] Adding 4128764k swap on /dev/sda2. Priority:-1
extents:1 across:4128764k FS
[ 2.573597] EXT4-fs (sda1): mounted filesystem with ordered data
mode. Opts: (null)
[ 2.597771] systemd-journald[283]: Received request to flush
runtime journal from PID 1
[ 2.802288] audit: type=1305 audit(1424892361.629:3): audit_pid=369
old=0 auid=4294967295 ses=4294967295 res=1
[ 2.819660] traps: systemd-cgroups[380] general protection
ip:7fb227939028 sp:7fff6bac59c8 error:0 in
ld-2.18.so[7fb227921000+20000]
[ 3.016262] traps: systemd-cgroups[390] general protection
ip:7f456f7b6028 sp:7fffdc059718 error:0 in
ld-2.18.so[7f456f79e000+20000]

>
> --
> vda
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/