Re: [PATCH v2] libbpf: Add some details for BTF parsing failures

From: Andrii Nakryiko
Date: Wed Jan 24 2024 - 12:46:52 EST


On Tue, Jan 23, 2024 at 8:37 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Tue, Jan 23, 2024 at 8:25 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Tue, Jan 23, 2024 at 12:44 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > >
> > > As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
> > > valid kernel BTF" message makes diagnosing the kernel build issue some
> > > what cryptic. Add a little more detail with the hope of helping users.
> > >
> > > Before:
> > > ```
> > > libbpf: failed to find valid kernel BTF
> > > libbpf: Error loading vmlinux BTF: -3
> > > libbpf: failed to load object 'lock_contention_bpf'
> > > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > > ```
> > >
> > > After no access /sys/kernel/btf/vmlinux:
> > > ```
> > > libbpf: Unable to access canonical vmlinux BTF from /sys/kernel/btf/vmlinux
> > > libbpf: Error loading vmlinux BTF: -3
> > > libbpf: failed to load object 'lock_contention_bpf'
> > > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > > ```
> > >
> > > After no BTF /sys/kernel/btf/vmlinux:
> > > ```
> > > libbpf: Failed to load vmlinux BTF from /sys/kernel/btf/vmlinux, was CONFIG_DEBUG_INFO_BTF enabled?
> > > libbpf: Error loading vmlinux BTF: -3
> > > libbpf: failed to load object 'lock_contention_bpf'
> > > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > > ```
> > >
> > > Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@xxxxxxxxxxxxxx/
> > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > >
> > > ---
> > > v2. Try to address review comments from Andrii Nakryiko.
> > > ---
> > > tools/lib/bpf/btf.c | 49 ++++++++++++++++++++++++++++++++-------------
> > > 1 file changed, 35 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > index ee95fd379d4d..d8a05dda0836 100644
> > > --- a/tools/lib/bpf/btf.c
> > > +++ b/tools/lib/bpf/btf.c
> > > @@ -4920,16 +4920,25 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
> > > return 0;
> > > }
> > >
> > > +static struct btf *btf__load_vmlinux_btf_path(const char *path)
> >
> > I don't think we need this helper, you literally call btf__parse() and
> > pr_debug(), that's all
> >
> > > +{
> > > + struct btf *btf;
> > > + int err;
> > > +
> > > + btf = btf__parse(path, NULL);
> > > + err = libbpf_get_error(btf);
> >
> > we should stop using libbpf_get_error, in libbpf v1.0+ it's best to do just
> >
> > btf = btf__parse(path, NULL);
> > if (!btf) {
> > err = -errno;
> > pr_debug(...);
> > return NULL;
> > }
> >
> > > + pr_debug("loading kernel BTF '%s': %d\n", path, err);
> > > + return err ? NULL : btf;
> > > +}
> > > +
> > > /*
> > > * Probe few well-known locations for vmlinux kernel image and try to load BTF
> > > * data out of it to use for target BTF.
> > > */
> > > struct btf *btf__load_vmlinux_btf(void)
> > > {
> > > + /* fall back locations, trying to find vmlinux on disk */
> > > const char *locations[] = {
> > > - /* try canonical vmlinux BTF through sysfs first */
> > > - "/sys/kernel/btf/vmlinux",
> > > - /* fall back to trying to find vmlinux on disk otherwise */
> > > "/boot/vmlinux-%1$s",
> > > "/lib/modules/%1$s/vmlinux-%1$s",
> > > "/lib/modules/%1$s/build/vmlinux",
> > > @@ -4938,29 +4947,41 @@ struct btf *btf__load_vmlinux_btf(void)
> > > "/usr/lib/debug/boot/vmlinux-%1$s.debug",
> > > "/usr/lib/debug/lib/modules/%1$s/vmlinux",
> > > };
> > > - char path[PATH_MAX + 1];
> > > + const char *location;
> > > struct utsname buf;
> > > struct btf *btf;
> > > - int i, err;
> > > + int i;
> > >
> > > - uname(&buf);
> > > + /* try canonical vmlinux BTF through sysfs first */
> > > + location = "/sys/kernel/btf/vmlinux";
> > > + if (faccessat(AT_FDCWD, location, R_OK, AT_EACCESS) == 0) {
> > > + btf = btf__load_vmlinux_btf_path(location);
> > > + if (btf)
> > > + return btf;
> > > +
> > > + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",
> > > + location);
> >
> > Mentioning CONFIG_DEBUG_INFO_BTF seems inappropriate here,
> > /sys/kernel/btf/vmlinux exists, we just failed to parse its data,
> > right? So it's not about CONFIG_DEBUG_INFO_BTF, we just don't support
> > something in BTF data. Just pr_warn("Failed to load vmlinux BTF from
> > %s: %d", location, err); should be good
>
> I think that assumes a lot about a user, they understand what BTF
> means, they know it is controlled by a kernel config option, and that
> the config option needs to be overridden (as it is defaulted off) for
> BTF to work. Given this escaped Raspberry Pi OS the potential for this
> mistake seems high - hence wanting to highlight the config option.

But there is nothing wrong with CONFIG_DEBUG_INFO_BTF, it is enabled,
and hence there is /sys/kernel/btf/vmlinux on the system. With
CONFIG_DEBUG_INFO_BTF suggestion you'll just lead users astray. What
am I missing?

>
> > > + } else
> > > + pr_warn("Unable to access canonical vmlinux BTF from %s\n", location);
> >
> > here the question of CONFIG_DEBUG_INFO_BTF is more appropriate, if
> > /sys/kernel/btf/vmlinux (on modern enough kernels) is missing, then
> > CONFIG_DEBUG_INFO_BTF is missing, probably. But I'd emit this only
> > after trying all the fallback paths and not finding anything.
> >
> > also stylistical nit: if one side of if has {}, the other has to have
> > {} as well, even if it's just one line
> >
> > >
> > > + uname(&buf);
> > > for (i = 0; i < ARRAY_SIZE(locations); i++) {
> > > - snprintf(path, PATH_MAX, locations[i], buf.release);
> > > + char path[PATH_MAX + 1];
> > > +
> > > + snprintf(path, sizeof(path), locations[i], buf.release);
> > >
> > > + btf = btf__load_vmlinux_btf_path(path);
> > > if (faccessat(AT_FDCWD, path, R_OK, AT_EACCESS))
> > > continue;
> > >
> > > - btf = btf__parse(path, NULL);
> > > - err = libbpf_get_error(btf);
> > > - pr_debug("loading kernel BTF '%s': %d\n", path, err);
> > > - if (err)
> > > - continue;
> > > + btf = btf__load_vmlinux_btf_path(location);
> > > + if (btf)
> > > + return btf;
> > >
> > > - return btf;
> > > + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",
> >
> > we should do better here as well. We should distinguish between "there
> > is vmlinux image, but it has no BTF" vs "there is no vmlinux image" vs
> > "vmlinux image is there, there is BTF, but we can't parse it". See
> > btf__parse(). We return -ENODATA if ELF doesn't have BTF, that's the
> > first situation. We can probably use faccessat() check for second
> > situation. Everything else can be reported as pr_debug() with location
> > (but still no CONFIG_DEBUG_INFO_BTF, it's meaningless for fallback BTF
> > locations)
> >
> > > + path);
> > > }
> > >
> > > - pr_warn("failed to find valid kernel BTF\n");
> >
> > and then here we can probably warn that we failed to find any kernel
> > BTF, and suggest CONFIG_DEBUG_INFO_BTF
>
> Andrii, you've basically written this patch, can I pass this over to you?

I think it would be great if you can finish thi, thanks.

>
> Thanks,
> Ian
>
> > > return libbpf_err_ptr(-ESRCH);
> > > }
> > >
> > > --
> > > 2.43.0.429.g432eaa2c6b-goog
> > >