Re: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches

From: Connor O'Brien
Date: Wed Dec 07 2022 - 21:19:21 EST


On Wed, Nov 9, 2022 at 8:45 AM Yonghong Song <yhs@xxxxxxxx> wrote:
>
>
>
> On 11/8/22 6:41 PM, Connor O'Brien wrote:
> > Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
> > mismatches are expected and module loading should proceed
> > anyway. Logging with pr_warn() on every one of these "benign"
> > mismatches creates unnecessary noise when many such modules are
> > loaded. Instead, limit logging to the case where a BTF mismatch
> > actually prevents a module form loading.
> >
> > Signed-off-by: Connor O'Brien <connoro@xxxxxxxxxx>
> > ---
> > kernel/bpf/btf.c | 7 ++++---
> > 1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 5579ff3a5b54..406370487413 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > }
> > btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
> > if (IS_ERR(btf)) {
> > - pr_warn("failed to validate module [%s] BTF: %ld\n",
> > - mod->name, PTR_ERR(btf));
>
> I think such warning still useful even with
> CONFIG_MODULE_ALLOW_BTF_MISMATCH.
> Can we use pr_warn_ratelimited instead of pr_warn in the above to
> avoid excessive warnings?

I gave this a try on a Pixel 6 but I'm not sure it quite addresses the
issue. The amount of logging doesn't seem to decrease much, I think
because the interval between loading one mismatched module and the
next can be greater than the default rate limit. To my mind, the issue
is the total volume of these messages more so than their rate.

For context, Android devices using the GKI may load hundreds of
separately-built modules, and BTF mismatches are possible for any/all
of these. It was pointed out to me that btf_verifier_log_type can also
print several more lines per mismatched module. ~5 lines of logging
for each mismatched module can start to add up, in terms of both
overhead and the noise added to the kernel logs.

This is more subjective but I think the warnings also read as though
this is a more serious failure that might prevent affected modules
from working correctly; anecdotally, I've gotten multiple questions
about them asking if something is broken. This can be a red herring
for anyone unfamiliar with BTF who is reading the logs to debug
unrelated issues. In the CONFIG_MODULE_ALLOW_BTF_MISMATCH=y case the
flood of warnings seems out of proportion to the actual result
(modules still load successfully, just without debug info) especially
since the user has explicitly enabled a config saying they expect
mismatches.

If there needs to be some logging in the "mismatch allowed" case,
could an acceptable middle ground be to use pr_warn_once to send a
single message reporting that mismatches were detected & module BTF
debug info might be unavailable? Alternatively, if we could opt out of
module BTF loading then that would also avoid this issue, but that's
already been proposed before ([1], [2]) so I thought working with the
existing config option might be preferred.

[1] https://lore.kernel.org/bpf/20220209052141.140063-1-connoro@xxxxxxxxxx/
[2] https://lore.kernel.org/bpf/20221004222725.2813510-1-sdf@xxxxxxxxxx/