Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook

From: Alexei Starovoitov
Date: Fri Apr 12 2019 - 17:27:20 EST


On Fri, Apr 05, 2019 at 12:35:22PM -0700, Andrey Ignatov wrote:
> v2->v3:
> - simplify C based selftests by relying on variable offset stack access.
>
> v1->v2:
> - add fs/proc/proc_sysctl.c mainteners to Cc:.
>
> The patch set introduces new BPF hook for sysctl.
>
> It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
> BPF_CGROUP_SYSCTL.
>
> BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
> that accesses (read/write) to sysctl can be controlled for specific cgroup
> and either allowed or denied, or traced.
>
> The hook has access to sysctl name, current sysctl value and (on write
> only) to new sysctl value via corresponding helpers. New sysctl value can
> be overridden by program. Both name and values (current/new) are
> represented as strings same way they're visible in /proc/sys/. It is up to
> program to parse these strings.
>
> To help with parsing the most common kind of sysctl value, vector of
> integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
> semantic similar to user space strtol(3) and strtoul(3).
>
> The hook also provides bpf_sysctl context with two fields:
> * @write indicates whether sysctl is being read (= 0) or written (= 1);
> * @file_pos is sysctl file position to read from or write to, can be
> overridden.
>
> The hook allows to make better isolation for containerized applications
> that are run as root so that one container can't change a sysctl and affect
> all other containers on a host, make changes to allowed sysctl in a safer
> way and simplify sysctl tracing for cgroups.

Applied to bpf-next. Thanks!

Andrey,
as a follow up please add a doc describing that this bpf hook cannot be used
as a security mechanism to limit sysctl usage.
Like: explaining that task_dfl_cgroup(current) is checked at the time of read/write,
it's not a replacement for sysctl_perm, root can detach bpf progs, etc.
I think the commit 7568f4cbbeae ("selftests/bpf: C based test for sysctl and strtoX")
gives an idea of what is possible with this hook and intended usage,
but it needs to be clearly documented that it's for 'trusted root' environment.