Re: [PATCH v2 1/3] capabilities: Introduce CAP_CHECKPOINT_RESTORE

From: Andrei Vagin
Date: Mon Jun 08 2020 - 23:42:28 EST


On Wed, Jun 03, 2020 at 06:23:26PM +0200, Adrian Reber wrote:
> This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
> checkpoint/restore for non-root users.
>
> Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has been
> asked numerous times if it is possible to checkpoint/restore a process as
> non-root. The answer usually was: 'almost'.
>
> The main blocker to restore a process as non-root was to control the PID of the
> restored process. This feature available via the clone3 system call, or via
> /proc/sys/kernel/ns_last_pid is unfortunately guarded by CAP_SYS_ADMIN.
>
> In the past two years, requests for non-root checkpoint/restore have increased
> due to the following use cases:
> * Checkpoint/Restore in an HPC environment in combination with a resource
> manager distributing jobs where users are always running as non-root.
> There is a desire to provide a way to checkpoint and restore long running
> jobs.
> * Container migration as non-root
> * We have been in contact with JVM developers who are integrating
> CRIU into a Java VM to decrease the startup time. These checkpoint/restore
> applications are not meant to be running with CAP_SYS_ADMIN.
>
...
>
> The introduced capability allows to:
> * Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
> for the corresponding PID namespace via ns_last_pid/clone3.
> * Open files in /proc/pid/map_files when the current user is
> CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for recovering
> files that are unreachable via the file system such as deleted files, or memfd
> files.

PTRACE_O_SUSPEND_SECCOMP is needed for C/R and it is protected by
CAP_SYS_ADMIN too.

Thanks,
Andrei