Re: [PATCH v3 1/2] libbpf: show error info about missing ".BTF" section

From: Quentin Monnet
Date: Tue Jan 03 2023 - 10:04:05 EST


2022-12-20 16:13 UTC-0800 ~ Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> On Tue, Dec 20, 2022 at 3:34 AM Leo Yan <leo.yan@xxxxxxxxxx> wrote:
>>
>> On Tue, Dec 20, 2022 at 09:31:14AM +0800, Changbin Du wrote:
>>
>> [...]
>>
>>>>> Now will print below info:
>>>>> libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
>>>>
>>>> Recently I encountered the same issue, it could be caused by:
>>>> either missing to install tool pahole or missing to enable kernel
>>>> configuration CONFIG_DEBUG_INFO_BTF.
>>>>
>>>> Could we give explict info for reasoning failure? Like:
>>>>
>>>> "libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux,
>>>> please install pahole and enable CONFIG_DEBUG_INFO_BTF=y for kernel building".
>>>>
>>> This is vmlinux special information and similar tips are removed from
>>> patch V2. libbpf is common for all ELFs.
>>
>> Okay, I see. Sorry for noise.
>>
>>>>> Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory
>>>>
>>>> This log is confusing when we can find vmlinux file but without BTF
>>>> section. Consider to use a separate patch to detect vmlinux not
>>>> found case and print out "No such file or directory"?
>>>>
>>> I think it's already there. If the file doesn't exist, open will fail.
>>
>> [...]
>>
>>>>> @@ -990,6 +990,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
>>>>> err = 0;
>>>>>
>>>>> if (!btf_data) {
>>>>> + pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path);
>>>>> err = -ENOENT;
>>
>> btf_parse_elf() returns -ENOENT when ELF file doesn't contain BTF
>> section, therefore, bpftool dumps error string "No such file or
>> directory". It's confused that actually vmlinux is existed.
>>
>> I am wondering if we can use error -LIBBPF_ERRNO__FORMAT (or any
>> better choice?) to replace -ENOENT at here, this can avoid bpftool to
>> outputs "No such file or directory" in this case.
>
> The only really meaningful error code would be -ESRCH, which
> strerror() will translate to "No such process", which is also
> completely confusing.
>
> In general, I always found these strerror() messages extremely
> unhelpful and confusing. I wonder if we should make an effort to
> actually emit symbolic names of errors instead (literally, "-ENOENT"
> in this case). This is all tooling for engineers, I find -ENOENT or
> -ESRCH much more meaningful as an error message, compared to "No such
> file" seemingly human-readable interpretation.
>
> Quenting, what do you think about the above proposal for bpftool? We
> can have some libbpf helper internally and do it in libbpf error
> messages as well and just reuse the logic in bpftool, perhaps?

Apologies for the delay.
What you're proposing is to replace all messages currently looking like
this:

$ bpftool prog
Error: can't get next program: Operation not permitted

by:

$ bpftool prog
Error: can't get next program: -EPERM

Do I understand correctly?

I think the strerror() messages are helpful in some occasions (they
_are_ more human-friendly to many users), but it's also true that
they're not always precise. With bpftool, "Invalid argument" is a
classic when the program doesn't load, and may lead to confusion with
the args passed to bpftool on the command line. Then there are the other
corner cases like the one discussed in this thread. So, why not.

If we do change, yeah I'd rather have as much of this handling in libbpf
itself, and then adjust bpftool to handle the remaining cases, for
consistency.

Quentin