Re: [PATCH v5 2/5] userfaultfd: add /dev/userfaultfd for fine grained access control

From: Mike Rapoport
Date: Thu Aug 11 2022 - 02:38:37 EST


On Mon, Aug 08, 2022 at 10:56:11AM -0700, Axel Rasmussen wrote:
> Historically, it has been shown that intercepting kernel faults with
> userfaultfd (thereby forcing the kernel to wait for an arbitrary amount
> of time) can be exploited, or at least can make some kinds of exploits
> easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we
> changed things so, in order for kernel faults to be handled by
> userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl
> must be configured so that any unprivileged user can do it.
>
> In a typical implementation of a hypervisor with live migration (take
> QEMU/KVM as one such example), we do indeed need to be able to handle
> kernel faults. But, both options above are less than ideal:
>
> - Toggling the sysctl increases attack surface by allowing any
> unprivileged user to do it.
>
> - Granting the live migration process CAP_SYS_PTRACE gives it this
> ability, but *also* the ability to "observe and control the
> execution of another process [...], and examine and change [its]
> memory and registers" (from ptrace(2)). This isn't something we need
> or want to be able to do, so granting this permission violates the
> "principle of least privilege".
>
> This is all a long winded way to say: we want a more fine-grained way to
> grant access to userfaultfd, without granting other additional
> permissions at the same time.
>
> To achieve this, add a /dev/userfaultfd misc device. This device
> provides an alternative to the userfaultfd(2) syscall for the creation
> of new userfaultfds. The idea is, any userfaultfds created this way will
> be able to handle kernel faults, without the caller having any special
> capabilities. Access to this mechanism is instead restricted using e.g.
> standard filesystem permissions.
>
> Acked-by: Nadav Amit <namit@xxxxxxxxxx>
> Acked-by: Peter Xu <peterx@xxxxxxxxxx>
> Signed-off-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>

Acked-by: Mike Rapoport <rppt@xxxxxxxxxxxxx>

> ---
> fs/userfaultfd.c | 73 +++++++++++++++++++++++++-------
> include/uapi/linux/userfaultfd.h | 4 ++
> 2 files changed, 61 insertions(+), 16 deletions(-)

--
Sincerely yours,
Mike.