Re: dozens of sysbot reports

From: Linus Torvalds
Date: Fri Sep 03 2021 - 19:08:49 EST


On Fri, Sep 3, 2021 at 4:00 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> > IOW, it sounds like you can send some netlink message that causes
> > insane hash size allocations. Shouldn't _that_ be fixed?
>
> Probably, but as I said there are many different reports.
>
> If there was only one or two, I would simply have sent a fix(es).
>
> I will probably release these bugs, so that they can be spread among
> interested parties.

Sure.

Let's keep the warning in place. We can remove it before the actual
release if things don't get better, but it does look like it's
actually finding places where people should have checked limits more,
rather than apparently just relying on the allocation failing.

Because with enough memory, the allocations traditionally didn't fail
- they just succeed with completely insane allocations and absolutely
horrendous latencies (ie allocating and possibly clearing gigabytes
and gigabytes of data).

This other one:

> WARNING: CPU: 1 PID: 26011 at mm/util.c:597 kvmalloc_node+0x111/0x120
> mm/util.c:597
> Modules linked in:
> CPU: 1 PID: 26011 Comm: syz-executor.2 Not tainted 5.14.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> RIP: 0010:kvmalloc_node+0x111/0x120 mm/util.c:597
> Call Trace:
> check_btf_line+0x1a9/0xad0 kernel/bpf/verifier.c:9925

Yeah, that code should check "nr_linfo" a lot more than it seems to do.

It had just added __GFP_NOWARN to hide the fact that it did crazy
allocations and just wanted the craziest ones to fail silently.

I think it should just limit itself to something sane.

Linus