Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

From: Alexei Starovoitov
Date: Mon Oct 19 2015 - 16:48:23 EST


On 10/19/15 1:03 PM, Hannes Frederic Sowa wrote:

I doubt it will stay a lightweight feature as it should not be in the
responsibility of user space to provide those debug facilities.

It feels we're talking past each other.
I want to solve 'persistent map' problem.
debugging of maps/progs, hierarchy, etc are all nice to have,
but different issues.
In case of persistent maps I imagine unprivileged process would want
to use it eventually as well, so this requirement already kills cdev
approach for me, since I don't think we ever let unprivileged apps
create cdev with syscall.

The bpf syscall is still used to create the pseudo nodes. If they should
be persistent they just get registered in the sysfs class hierarchy.

nope. they should not. sysfs is debugging/tunning facility.
There is absolutely no need for bpf to plug into sysfs.

Doing 'resource stats' via sysfs requires bpf to add to sysfs, which
is not this cdev approach.

This is not yet part of the patch, but I think this would be added.
Daniel?

please don't. I'm strongly against adding unnecessary bloat.

I don't think there are broad differences. But in case a namespaces uses
huge number of maps with tons of data, the admin in the initial
namespace might want to debug that without searching all mountpoints and
find dependencies between processes etc. IMHO sysfs approach can be
better extended here.

sure, then we can force all bpffs to have the same hierarchy and mounted
in /sys/kernel/bpf location. That would be the same.

It feels you're pushing for cdev only because of that potential
debugging need. Did you actually face that need? I didn't and
don't like to add 'nice to have' feature until real need comes.

Also I don't buy the point of reinventing sysfs. bpffs is not doing
sysfs. I don't want to see _every_ bpf object in sysfs. It's way too
much overhead. Classic doesn't have sysfs and everyone have been
using it just fine.

But classic bpf does not have persistence for maps and data. ;) There is
a 1:1 relationship between socket and bpf_prog for example.

single task in seccomp can have a chain of bpf progs, so hierarchy
is already there.

But how can the filesystem be extended in terms of tunables and
information? File attributes? Wouldn't it need the same infrastructure
otherwise as sysfs? Some third-party lookup filesystem or ioctl? This
char dev approach also pins maps and progs while giving more policy in
hand of central user space programs we are currently using (udev,
systemd, whatever, etc.).

tunables for bpf maps? There are no such things today.
I think you're implying that we can add rhashtable type of map, so
admin can tune thresholds ? Ouch. I think if we add it, its parameters
will be specified by the user that is creating the map only. There will
be no tunables exposed to sysfs and there should be no way of creating
maps via sysfs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/