Re: [PATCH bpf-next] bpf: limit bpf_core_types_are_compat() recursion

From: Yonghong Song
Date: Tue Dec 21 2021 - 01:34:21 EST




On 12/17/21 11:31 AM, Matteo Croce wrote:
On Wed, Dec 15, 2021 at 7:21 PM Matteo Croce
<mcroce@xxxxxxxxxxxxxxxxxxx> wrote:

On Wed, Dec 15, 2021 at 6:29 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Wed, Dec 15, 2021 at 6:54 AM Matteo Croce <mcroce@xxxxxxxxxxxxxxxxxxx> wrote:

Maybe do a level check here?
Since calling it and immediately returning doesn't conserve
the stack.
If it gets called it can finish fine, but
calling it again would be too much.
In other words checking the level here gives us
room for one more frame.


I thought that the compiler was smart enough to return before
allocating most of the frame.
I tried and this is true only with gcc, not with clang.

Interesting. That's a surprise.
Could you share the asm that gcc generates?


Sure,

This is the gcc x86_64 asm on a stripped down
bpf_core_types_are_compat() with a 1k struct on the stack:

bpf_core_types_are_compat:
test esi, esi
jle .L69
push r15
push r14
push r13
push r12
push rbp
mov rbp, rdi
push rbx
mov ebx, esi
sub rsp, 9112
[...]
.L69:
or eax, -1
ret

This latest clang:

bpf_core_types_are_compat: # @bpf_core_types_are_compat
push rbp
push r15
push r14
push rbx
sub rsp, 1000
mov r14d, -1
test esi, esi
jle .LBB0_7
[...]
.LBB0_7:
mov eax, r14d
add rsp, 1000
pop rbx
pop r14
pop r15
pop rbp
ret

+ err = __bpf_core_types_are_compat(local_btf, local_id,
+ targ_btf, targ_id,
+ level - 1);
+ if (err <= 0)
+ return err;
+ }
+
+ /* tail recurse for return type check */
+ btf_type_skip_modifiers(local_btf, local_type->type, &local_id);
+ btf_type_skip_modifiers(targ_btf, targ_type->type, &targ_id);
+ goto recur;
+ }
+ default:
+ pr_warn("unexpected kind %s relocated, local [%d], target [%d]\n",
+ btf_type_str(local_type), local_id, targ_id);

That should be bpf_log() instead.


To do that I need a struct bpf_verifier_log, which is not present
there, neither in bpf_core_spec_match() or bpf_core_apply_relo_insn().

It is there. See:
err = bpf_core_apply_relo_insn((void *)ctx->log, insn, ...

Should we drop the message at all?

Passing it into bpf_core_spec_match() and further into
bpf_core_types_are_compat() is probably unnecessary.
All callers have an error check with a log right after.
So I think we won't lose anything if we drop this log.


Nice.


+ return 0;
+ }
+}

Please add tests that exercise this logic by enabling
additional lskels and a new test that hits the recursion limit.
I suspect we don't have such case in selftests.

Thanks!

Will do!

Thanks!


Hi,

I'm writing a test which exercise that function.
I can succesfully trigger a call to __bpf_core_types_are_compat() with
these calls:

bpf_core_type_id_kernel(struct sk_buff);
bpf_core_type_exists(struct sk_buff);
bpf_core_type_size(struct sk_buff);

but the kind will obviously be BTF_KIND_STRUCT.
I'm trying to do the same with kind BTF_KIND_FUNC_PROTO instead with:

void func_proto(int, unsigned int);
bpf_core_type_id_kernel(func_proto);

but I get a clang crash[1], while just checking the existence with:

typedef int (*func_proto_typedef)(struct sk_buff *);
bpf_core_type_exists(func_proto_typedef);

I don't reach even bpf_core_spec_match().

Which is a simple way to generate a BTF_KIND_FUNC_PROTO BTF field?

[1] https://github.com/llvm/llvm-project/issues/52779

Thanks for Matteo. The error message is improved in https://reviews.llvm.org/D116063 to make it easy to understand the problem. I also posted the explanation here so other people, if hitting
a similar issue, can be aware of what is going on.

The following is a simple reproducible test case:

$ cat bug.c
extern int do_smth(int);

int test() {

return __builtin_btf_type_id(*(typeof(do_smth) *)do_smth, 1);

}

$ clang -target bpf -O2 -g -c bug.c
fatal error: error in backend: Empty type name for BTF_TYPE_ID_REMOTE reloc
...
Let us try to reproduce the IR to see what is really going on with command,

clang -target bpf -O2 -g bug.c -emit-llvm -S -Xclang -disable-llvm-passes
The IR,

define dso_local i32 @test() #0 !dbg !7 {
entry:
%0 = call i64 @llvm.bpf.btf.type.id(i32 0, i64 1), !dbg !12, !llvm.preserve.access.index !13
%conv = trunc i64 %0 to i32, !dbg !12
ret i32 %conv, !dbg !15
}
...
!7 = distinct !DISubprogram(name: "test", scope: !1, file: !1, line: 2, type: !8, scopeLine: 2, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !11)
!8 = !DISubroutineType(types: !9)
!9 = !{!10}
!10 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!11 = !{}
!12 = !DILocation(line: 3, column: 10, scope: !7)
!13 = !DISubroutineType(types: !14)
!14 = !{!10, !10}
In the above, we really try to relocate a 'subroutine' (func pointer) type with debuginfo id 13 which is actually "int ()(int)". There are no actually name for type 13 and libbpf is not able to relocate for a function "int ()(int)" as it could have many matches.

https://reviews.llvm.org/D116063 improved the error message as below
to make it a little bit more evident what is the problem:

$ clang -target bpf -O2 -g -c bug.c
fatal error: error in backend: SubroutineType not supported for BTF_TYPE_ID_REMOTE reloc



Regards,