Re: [RFC] catching sys_reboot syscall

From: Daniel Lezcano
Date: Thu Aug 11 2011 - 12:58:17 EST


On 08/11/2011 06:30 PM, Bruno PrÃmont wrote:
> On Wed, 10 August 2011 Daniel Lezcano <daniel.lezcano@xxxxxxx> wrote:
>> On 08/10/2011 10:10 PM, Bruno PrÃmont wrote:
>>> Hi Daniel,
>>>
>>> [I'm adding containers ml as we had a discussion there some time ago
>>> for this feature]
>> [ ... ]
>>
>>>> + if (cmd == LINUX_REBOOT_CMD_RESTART2)
>>>> + if (strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1) < 0)
>>>> + return -EFAULT;
>>>> +
>>>> + /* If we are not in the initial pid namespace, we send a signal
>>>> + * to the parent of this init pid namespace, notifying a shutdown
>>>> + * occured */
>>>> + if (pid_ns != &init_pid_ns)
>>>> + pid_namespace_reboot(pid_ns, cmd, buffer);
>>> Should there be a return here?
>>> Or does pid_namespace_reboot() never return by submitting signal to
>>> parent?
>> Yes, it does not return a value, like 'do_notify_parent_cldstop'
> So execution flow continues reaching the whole "host reboot code"?
>
> That's not so good as it then prevents using CAP_SYS_BOOT inside PID namespace
> to limit access to rebooting the container from inside as giving a process
> inside container CAP_SYS_BOOT would cause host to reboot (and when not given
> process inside container would get -EPERM in all cases).
>
> Wouldn't the following be better?:
> ...
> +
> + /* We only trust the superuser with rebooting the system. */
> + if (!capable(CAP_SYS_BOOT))
> + return -EPERM;
> +
> + /* If we are not in the initial pid namespace, we send a signal
> + * to the parent of this init pid namespace, notifying a shutdown
> + * occured */
> + if (pid_ns != &init_pid_ns) {
> + pid_namespace_reboot(pid_ns, cmd, buffer);
> + return 0;
> + }
> +
> mutex_lock(&reboot_mutex);
> switch (cmd) {
> ...
>
>
> If I misunderstood, please correct me.

Yep, this is what I did at the beginning but I realized I was closing
the door for future applications using the pid namespaces. The pid
namespace could be used by another kind of application, not a container,
running some administrative tasks so they may want to shutdown the host
from a different pid namespace.

For this reason, to prevent this execution flow, the container has to
drop the CAP_SYS_BOOT in addition of taking care of the SIGCHLD signal
with CLDREBOOT.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/