Re: [PATCH/RFC] Re: 'perf test BPF' failing, libbpf regression wrt "basic API for BPF obj name"

From: Arnaldo Carvalho de Melo
Date: Thu Nov 30 2017 - 14:00:51 EST


Em Thu, Nov 30, 2017 at 10:28:08AM -0800, Martin KaFai Lau escreveu:
> On Thu, Nov 30, 2017 at 01:53:58PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Nov 30, 2017 at 12:01:10AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Wed, Nov 29, 2017 at 02:31:36PM -0800, Martin KaFai Lau escreveu:
> > > > On Wed, Nov 29, 2017 at 06:15:43PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > Em Wed, Nov 29, 2017 at 01:07:34PM -0800, Martin KaFai Lau escreveu:
> > > > > > On Tue, Nov 28, 2017 at 04:05:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > > [root@jouet ~]# perf test -v bpf
> > > > > > > 39: BPF filter :
> > > > > > > 39.1: Basic BPF filtering :
> > > > > > > Kernel build dir is set to /lib/modules/4.14.0+/build
> > > > > > [ ... ]
> > > > > > > libbpf: failed to create map (name: 'flip_table'): Invalid argument
> > > > > > > libbpf: failed to load object '[basic_bpf_test]'
> > > > > > > bpf: load objects failed
> > > > > > 88cda1c9da02 ("bpf: libbpf: Provide basic API support to specify BPF obj name")
> > > > > > is introduced in 4.15.
> > >
> > > > > > I think the perf@xxxxxxxxxxx broke on older kernels like 4.14 because
> > > > > > the new bpf prog/map name is only introduced since 4.15.

> > > > > > The newer perf needs to be compatible with an older kernel?

> > > > > Sure :-)

> > > > Would the latest features introduced in perf/libbpf supposed to be
> > > > available in the latest kernel only? What may be the reason that the

> > > Yes, then the new perf binary should try to use the new stuff, if it
> > > fails, use the old one, there is no requirement that one uses perf 4.14
> > > in lockstep with the kernel 4.14 (or any other version), perf 4.15
> > > should work with the 4.14 kernel as well as with 4.15 (or any other
> > > future kernel), only limited by what it can grok up to when it was
> > > released.

> > So, see the patch below, that makes a 'perf test bpf' and my other test
> > cases, including that one for probe_read_str() work again, it just
> > fallbacks to a behaviour the older kernels can accept.

> Thanks for the patch.

> > We can improve it so that that EINVAL fallback happens only for
> > MAP_CREATE, and probably we don't need to change the size arg, just zero
> > the unused fields, but I haven't checked that.
> >
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 5128677e4117..3084f07c7c33 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -19,6 +19,7 @@
> > * License along with this program; if not, see <http://www.gnu.org/licenses>
> > */
> >
> > +#include <errno.h>
> > #include <stdlib.h>
> > #include <memory.h>
> > #include <unistd.h>
> > @@ -53,10 +54,26 @@ static inline __u64 ptr_to_u64(const void *ptr)
> > return (__u64) (unsigned long) ptr;
> > }
> >
> > -static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
> > - unsigned int size)
> > +static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, unsigned int size)
> > {
> > - return syscall(__NR_bpf, cmd, attr, size);
> > + int err = syscall(__NR_bpf, cmd, attr, size);
> > + if (err == -1 && (errno == EINVAL || errno == E2BIG)) {
> I would add a check to the length of map_name/prog_name.

Can you elaborate? What kind of check?

> > + const unsigned int old_union_size = offsetof(union bpf_attr, prog_name);
> > + /*
> > + * These were the ones that added fields after the old bpf_attr
> > + * layout in commit 88cda1c9da02 ("bpf: libbpf: Provide basic
> > + * API support to specify BPF obj name") so zero that out to
> > + * pass the CHECK_ATTR() test in kernel/bpf/syscall.c in older
> > + * kernels.
> > + */
> > + if (cmd == BPF_MAP_CREATE)
> > + memset(&attr->map_name, 0, size - offsetof(union bpf_attr, map_name));
> > + else
> > + memset(&attr->prog_name, 0, size - old_union_size);

> If bpf_attr is extended in the future, map_name/prog_name will still be
> used as the anchor for backward compatibility instead of trial and error
> attribute by attribute?

Then you will first try the latest and greatest, if it fails, go the
previous (like here), if it fails, etc.

That or some sort of versioning to make sure the kernel and the tools
can agree on a common set of functionality supported by both.

Again, this is how perf_evsel__open() works in the perf case, try the
latest and go fallbacking to the most recent set of features that could
somehow service what is needed or disable some feature and warn the
user, i.e. do the best you can with what you have.

With this patch in place I was able to have what was working before
88cda1c9da02 working again.

> Instead of sinking all future bpf_attr's backward compatibility
> requirements to sys_bpf, I would push it up to its own BPF_* command
> helper which has a better sense of its bpf_attr, i.e. push it up
> to bpf_create_map_node() and bpf_load_program_name() in this case.

Humm, we could try that approach, but the one in this patch seemed good
enough.

And after all if the first syscall() invokation, with the latest kernel
and latest tooling will work, right?

I think that we need as well bpf__strerror_METHOD() that can make sense
of error returns per method (aka "command helper"), to tell the user
when something can't be done why is that so: "you need a kernel >= 4.X",
"this needs root privs", "too many maps, etc"

Right now we have a generic, libbpf wide one:

static const char *libbpf_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(LIBELF)] = "Something wrong in libelf",
[ERRCODE_OFFSET(FORMAT)] = "BPF object format invalid",
[ERRCODE_OFFSET(KVERSION)] = "'version' section incorrect or lost",
[ERRCODE_OFFSET(ENDIAN)] = "Endian mismatch",
[ERRCODE_OFFSET(INTERNAL)] = "Internal error in libbpf",
[ERRCODE_OFFSET(RELOC)] = "Relocation failed",
[ERRCODE_OFFSET(VERIFY)] = "Kernel verifier blocks program loading",
[ERRCODE_OFFSET(PROG2BIG)] = "Program too big",
[ERRCODE_OFFSET(KVER)] = "Incorrect kernel version",
[ERRCODE_OFFSET(PROGTYPE)] = "Kernel doesn't support this program type",
};

int libbpf_strerror(int err, char *buf, size_t size)
{

/* use the above or return strerror_r() for the errno usual range */
}

But as time goes by, features get added, ABI gets extended, and without
a better kernel/user error reporting mechanism... Making sense of errno
in the context of a specific method and looking at the system state
looks like the best we can do to provide not-so-cryptic messages to the
poor users, and we're all users :-)

- Arnaldo

> > + err = syscall(__NR_bpf, cmd, attr, old_union_size);
> > + }
> > + return err;
> > }
> >
> > int bpf_create_map_node(enum bpf_map_type map_type, const char *name,