Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

From: Alexei Starovoitov
Date: Fri Oct 16 2015 - 12:18:59 EST


On 10/16/15 3:25 AM, Hannes Frederic Sowa wrote:
Namespaces at some point dealt with the same problem, they nowadays use
bind mounts of/proc/$$/ns/* to some place in the file hierarchy to keep
the namespace alive. This at least allows someone to build up its own
hierarchy with normal unix tools and not hidden inside a C-program. For
filedescriptors we already have/proc/$$/fd/* but it seems that doesn't
work out of the box nowadays.

bind mounting of /proc/../fd was initially proposed by Andy and we've
looked at it thoroughly, but after discussion with Eric it became
apparent that it doesn't fit here. At the end we need shell tools
to access maps.
Also I think you missed the hierarchy in this patch set _is_ built with
normal 'mkdir' and files are removed with 'rm'.
The only thing that C does is BPF_PIN_FD of fd that was received from
bpf syscall. That's way cleaner api than doing bind mount from C
program.
We've considered letting open() of the file return bpf specific
anon-inode, but decided to reserve that for other more natural file
operations. Therefore BPF_NEW_FD is needed.

I don't know in terms of how many objects bpf should be able to handle
and if such a bind-mount based solution would work, I guess not.

We definitely missed you at the last plumbers where it was discussed :)

In my opinion I still favor a user space approach.

that's not acceptable for tracing use cases. No daemons allowed.

Subsystems which use
ebpf in a way that no user space program needs to be running to control
them would need to export the fds by itself. E.g. something like
sysfs/kobject for tc? The hierarchy would then be in control of the
subsystem which could also create a proper naming hierarchy or maybe
even use an already given one. Do most other eBPF users really need to
persist file descriptors somewhere without user space control and pick
them up later?

I think it's way cleaner to have one way of solving it (like this patch
does) instead of asking every subsystem to solve it differently.
We've also looked at sysfs and it's ugly when it comes to removing,
since the user cannot use normal 'rm'.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/