Re: [PATCHv4 RESEND 0/3] syscalls,x86: Add execveat() system call

From: Andy Lutomirski
Date: Mon Oct 20 2014 - 18:48:35 EST


On Mon, Oct 20, 2014 at 6:48 AM, David Drysdale <drysdale@xxxxxxxxxx> wrote:
> On Sun, Oct 19, 2014 at 1:20 AM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>>
>>> [Added Eric Biederman, since I think your tree might be a reasonable
>>> route forward for these patches.]
>>>
>>> On Thu, Jun 5, 2014 at 6:40 AM, David Drysdale <drysdale@xxxxxxxxxx> wrote:
>>>> Resending, adding cc:linux-api.
>>>>
>>>> Also, it may help to add a little more background -- this patch is
>>>> needed as a (small) part of implementing Capsicum in the Linux kernel.
>>>>
>>>> Capsicum is a security framework that has been present in FreeBSD since
>>>> version 9.0 (Jan 2012), and is based on concepts from object-capability
>>>> security [1].
>>>>
>>>> One of the features of Capsicum is capability mode, which locks down
>>>> access to global namespaces such as the filesystem hierarchy. In
>>>> capability mode, /proc is thus inaccessible and so fexecve(3) doesn't
>>>> work -- hence the need for a kernel-space
>>>
>>> I just found myself wanting this syscall for another reason: injecting
>>> programs into sandboxes or otherwise heavily locked-down namespaces.
>>>
>>> For example, I want to be able to reliably do something like nsenter
>>> --namespace-flags-here toybox sh. Toybox's shell is unusual in that
>>> it is more or less fully functional, so this should Just Work (tm),
>>> except that the toybox binary might not exist in the namespace being
>>> entered. If execveat were available, I could rig nsenter or a similar
>>> tool to open it with O_CLOEXEC, enter the namespace, and then call
>>> execveat.
>>>
>>> Is there any reason that these patches can't be merged more or less as
>>> is for 3.19?
>>
>> Yes. There is a silliness in how it implements fexecve. The fexecve
>> case should be use the empty string "" not a NULL pointer to indication
>> that. That change will then harmonize execveat with the other ...at
>> system calls and simplify the code and remove a special case. I believe
>> using the empty string "" requires implementing the AT_EMPTY_PATH flag.
>
> Good point -- I'll shift to "" + AT_EMPTY_PATH.

Pending a better idea, I would also see if the patches can be changed
to return an error if d_path ends up with an "(unreachable)" thing
rather than failing inexplicably later on.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/