Re: SYSCALL, ptrace and syscall restart breakages (Re: [RFC] weirdcrap with vdso on uml/i386)

From: Andrew Lutomirski
Date: Sun Aug 21 2011 - 07:25:14 EST


On Sun, Aug 21, 2011 at 4:42 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Aug 21, 2011 at 07:34:43AM +0100, Al Viro wrote:
>> Suppose we have a traced process.  foo6() is called and the thing it
>> stopped before the sys_foo6() is reached kernel-side.  The sixth argument
>> is on stack, ebp is set to user esp.  SYSENTER happens, we read the
>> 6th argument from userland stack and put it along with the rest into
>> pt_regs.  tracer examines the arguments, modifies them (including the last
>> one) and lets the tracee run free - e.g. detaches from the tracee.
>>
>> What should happen if we happen to get a signal that would restart that
>> sucker?  Granted, it's not going to happen with mmap() - it doesn't, AFAICS,
>> do anything of that kind.  However, I wouldn't bet a dime on other 6-argument
>> syscalls not stepping on that.  sendto() and recvfrom(), in particular...
>>
>> OK, we return to userland.  The sixth argument is placed into %ebp.  Linus'
>> "pig and proud of that" trick works and we end up slapping userland
>> %esp into %ebp and hitting SYSENTER again.  Only one problem, though -
>> the sixth argument on user stack is completely unaffected by what tracer
>> had done.  Unlike the rest of arguments, that *are* changed.
>>
>> We could deal with that in case of SYSENTER if we e.g. replaced that
>>         jmp .Lenter_kernel
>> with
>>         jmp .Lrestart
>> and added
>> .Lrestart:
>>       movl %ebp, (%esp)
>>       jmp .Lenter_kernel
>> but in case of SYSCALL it seems to be even messier...  Comments?
>
> Oh, hell...  Compat SYSCALL one is really buggered on syscall restarts,
> ptrace or no ptrace.  Look: calling conventions for SYSCALL are
>        arg1..5: ebx, ebp, edx, edi, esi.  arg6: stack
> and after syscall restart we end up with
>        arg1..5: ebx, ecx, edx, edi, esi.  arg6: ebp
> so restart will instantly clobber arg2, in effect replacing it with arg6.
>
> And yes, adding ptrace to the mix makes things even uglier.  For one thing,
> changes to ECX via ptrace are completely lost on the fast exit.  Not pretty,
> and might make life painful for uml, but not for the majority of programs.
> What's worse, combination of ptrace with restart will lose changes to arg6
> (again, value on stack left as it was, changes to arg6 by tracer lost) *and*
> it will lose changes to arg2 (along with arg2 itself - see above).
>
> Linus' Dirty Trick(tm) is not trivial to apply - with SYSCALL we *do* retain
> the address of next insn and that's where we end up going.  IOW, SYSCALL not
> inside vdso32 currently works (for small values of "works", due to restart
> issues).  Playing with return elsewhere might break some userland code...
>
> Guys, that's *way* out of the area I'm comfortable with.
>

I don't see the point of all this hackery at all. sysenter/sysexit
indeed screws up some registers, but we can return on the iret path in
the case of restart.

So why do we lie to ptrace (and iret!) at all? Why not just fill in
pt_regs with the registers as they were (at least the
non-clobbered-by-sysenter ones), set the actual C parameters correctly
to contain the six arguments (in rdi, rsi, etc.), do the syscall, and
return back to userspace without any funny business? Is there some
ABI reason that, once we've started lying to tracers, we have to keep
doing so?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/