Re: WARNING in __mark_chain_precision

From: Hao Sun
Date: Wed Jan 04 2023 - 03:54:20 EST




> On 4 Jan 2023, at 1:47 PM, Yonghong Song <yhs@xxxxxxxx> wrote:
>
>
>
> On 1/3/23 10:27 AM, Alexei Starovoitov wrote:
>> On Mon, Jan 2, 2023 at 1:42 AM Hao Sun <sunhao.th@xxxxxxxxx> wrote:
>>>
>>>
>>>
>>> Yonghong Song <yhs@xxxxxxxx> 于2023年1月2日周一 03:20写道:
>>>>
>>>>
>>>>
>>>> On 12/30/22 1:44 AM, Hao Sun wrote:
>>>>>
>>>>>
>>>>> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> 于2022年12月30日周五 06:16写道:
>>>>>>
>>>>>> On Tue, Dec 27, 2022 at 9:24 PM Yonghong Song <yhs@xxxxxxxx> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/20/22 4:30 PM, Andrii Nakryiko wrote:
>>>>>>>> On Mon, Dec 19, 2022 at 11:13 AM <sdf@xxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> On 12/19, Hao Sun wrote:
>>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> The following backtracking bug can be triggered on the latest bpf-next and
>>>>>>>>>> Linux 6.1 with the C prog provided. I don't have enough knowledge about
>>>>>>>>>> this part in the verifier, don't know how to fix this.
>>>>>>>>>
>>>>>>>>> Maybe something related to commit be2ef8161572 ("bpf: allow precision
>>>>>>>>> tracking
>>>>>>>>> for programs with subprogs") and/or the related ones?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> This can be reproduced on:
>>>>>>>>>
>>>>>>>>>> HEAD commit: 0e43662e61f2 tools/resolve_btfids: Use pkg-config to locate
>>>>>>>>>> libelf
>>>>>>>>>> git tree: bpf-next
>>>>>>>>>> console log: https://pastebin.com/raw/45hZ7iqm
>>>>>>>>>> kernel config: https://pastebin.com/raw/0pu1CHRm
>>>>>>>>>> C reproducer: https://pastebin.com/raw/tqsiezvT
>>>>>>>>>
>>>>>>>>>> func#0 @0
>>>>>>>>>> 0: R1=ctx(off=0,imm=0) R10=fp0
>>>>>>>>>> 0: (18) r2 = 0x8000000000000 ; R2_w=2251799813685248
>>>>>>>>>> 2: (18) r6 = 0xffff888027358000 ;
>>>>>>>>>> R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>>>>>>>>> 4: (18) r7 = 0xffff88802735a000 ;
>>>>>>>>>> R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
>>>>>>>>>> 6: (18) r8 = 0xffff88802735e000 ;
>>>>>>>>>> R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
>>>>>>>>>> 8: (18) r9 = 0x8e9700000000 ; R9_w=156779191205888
>>>>>>>>>> 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>>>>>>>>> last_idx 10 first_idx 0
>>>>>>>>>> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
>>>>>>>>>> 11: R9_w=156779191205888
>>>>>>>>>> 11: (85) call #0
>>>>>>>>>> 12: (cc) w2 s>>= w7
>>>>>>>>
>>>>>>>> w2 should have been set to NOT_INIT (because r1-r5 are clobbered by
>>>>>>>> calls) and rejected here as !read_ok (see check_reg_arg()) before
>>>>>>>> attempting to mark precision for r2. Can you please try to debug and
>>>>>>>> understand why that didn't happen here?
>>>>>>>
>>>>>>> The verifier is doing the right thing here and the 'call #0' does
>>>>>>> implicitly cleared r1-r5.
>>>>>>>
>>>>>>> So for 'w2 s>>= w7', since w2 is used, the verifier tries to find
>>>>>>> its definition by backtracing. It encountered 'call #0', which clears
>>>>>>
>>>>>> and that's what I'm saying is incorrect. Normally we'd get !read_ok
>>>>>> error because s>>= is both READ and WRITE on w2, which is
>>>>>> uninitialized after call instruction according to BPF ABI. And that's
>>>>>> what actually seems to happen correctly in my (simpler) tests locally.
>>>>>> But something is special about this specific repro that somehow either
>>>>>> bypasses this logic, or attempts to mark precision before we get to
>>>>>> that test. That's what we should investigate. I haven't tried to run
>>>>>> this specific repro locally yet, so can't tell for sure.
>>>>>>
>>>>>
>>>>> So, the reason why w2 is not marked as uninit is that the kfunc call in
>>>>> the BPF program is invalid, "call #0", imm is zero, right?
>>>>
>>>> Yes, "call #0" is invalid. As the code below
>>>>
>>>>> /* skip for now, but return error when we find this in
>>>> fixup_kfunc_call */
>>>>> if (!insn->imm)
>>>>> return 0;
>>>>
>>>> The error report will be delayed later in fixup_kfunc_call().
>>>>
>>>> static int fixup_kfunc_call(struct bpf_verifier_env *env, struct
>>>> bpf_insn *insn,
>>>> struct bpf_insn *insn_buf, int insn_idx,
>>>> int *cnt)
>>>> {
>>>> const struct bpf_kfunc_desc *desc;
>>>>
>>>> if (!insn->imm) {
>>>> verbose(env, "invalid kernel function call not
>>>> eliminated in verifier pass\n");
>>>> return -EINVAL;
>>>> }
>>>>
>>>>
>>>>> In check_kfunc_call(), it skips this error temporarily:
>>>>>
>>>>> /* skip for now, but return error when we find this in fixup_kfunc_call */
>>>>> if (!insn->imm)
>>>>> return 0;
>>>>>
>>>>> So the kfunc call is the previous instruction before "w2 s>>= w7", this
>>>>> leads to the warning in backtrack_insn():
>>>>>
>>>>> /* regular helper call sets R0 */
>>>>> *reg_mask &= ~1;
>>>>> if (*reg_mask & 0x3f) {
>>>>> /* if backtracing was looking for registers R1-R5
>>>>> * they should have been found already.
>>>>> */
>>>>> verbose(env, "BUG regs %x\n", *reg_mask);
>>>>> WARN_ONCE(1, "verifier backtracking bug”);
>>>>> return -EFAULT;
>>>>> }
>>>>
>>>> The main triggering the backtrack_insn() is due to
>>>>
>>>> } else {
>>>> /* scalar += pointer
>>>> * This is legal, but we have to
>>>> reverse our
>>>> * src/dest handling in computing the range
>>>> */
>>>> err = mark_chain_precision(env,
>>>> insn->dst_reg);
>>>> if (err)
>>>> return err;
>>>> return adjust_ptr_min_max_vals(env, insn,
>>>> src_reg,
>>>> dst_reg);
>>>> }
>>>>
>>>>
>>>> unc#0 @0
>>>> 0: R1=ctx(off=0,imm=0) R10=fp0
>>>> 0: (18) r2 = 0x8000000000000 ; R2_w=2251799813685248
>>>> 2: (18) r6 = 0xffff888100d29000 ;
>>>> R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>>> 4: (18) r7 = 0xffff888100d2a000 ;
>>>> R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
>>>> 6: (18) r8 = 0xffff888100d2ac00 ;
>>>> R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
>>>> 8: (18) r9 = 0x8e9700000000 ; R9_w=156779191205888
>>>> 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>>> last_idx 10 first_idx 0
>>>> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
>>>> 11: R9_w=156779191205888
>>>> 11: (85) call #0
>>>> 12: (cc) w2 s>>= w7
>>>> last_idx 12 first_idx 12
>>>> parent didn't have regs=4 stack=0 marks: R1=ctx(off=0,imm=0)
>>>> R2_rw=P2251799813685248 R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>>> R7_rw=map_ptr(off=0,ks=156,vs=2624,imm=0) R8_w=map_ptr(off=0,ks=2396,v0
>>>> last_idx 11 first_idx 0
>>>> regs=4 stack=0 before 11: (85) call #0
>>>> BUG regs 4
>>>>
>>>> For insn 12, 'w2 s>>= w7', w2 is a scalar and w7 is a map_ptr. Hence,
>>>> based on the above verifier code, mark_chain_precision() is triggered.
>>>>
>>>> Not sure what is the purpose of this test. But to make it succeed,
>>>> first "call #0" need to change to a valid kfunc call, and second, you
>>>> might want to change 'w2 s>>= w7' to e.g., 'w9 s>>= w7' to avoid
>>>> precision tracking.
>>>>
>>>
>>> The purpose is not to make the test "succeed", the verifier temporarily
>>> skips the invalid kfunc insn "call #0", but this insn triggered a warning
>>> in backtrack_insn(), while it is supposed to reject the program either
>>> due to insn#12 32bit ptr alu or insn#11 invalid kfunc.
>>>
>>> Maybe something like the bellow, after applying the patch, the reproducer
>>> is rejected:
>>>
>>> func#0 @0
>>> 0: R1=ctx(off=0,imm=0) R10=fp0
>>> 0: (18) r2 = 0x8000000000000 ; R2_w=2251799813685248
>>> 2: (18) r6 = 0xffff88817d563000 ; R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>> 4: (18) r7 = 0xffff888171ee9000 ; R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
>>> 6: (18) r8 = 0xffff888171ee8000 ; R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
>>> 8: (18) r9 = 0x8e9700000000 ; R9_w=156779191205888
>>> 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>> last_idx 10 first_idx 0
>>> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
>>> 11: R9_w=156779191205888
>>> 11: (85) call #0
>>> 12: (cc) w2 s>>= w7
>>> last_idx 12 first_idx 12
>>> parent didn't have regs=4 stack=0 marks: R1=ctx(off=0,imm=0) R2_rw=P2251799813685248 R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0) R7_rw=map_ptr(off=0,ks=156,vs=2624,imm=0) R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0) R9_w=156779191205888 R10=fp0
>>> last_idx 11 first_idx 0
>>> regs=4 stack=0 before 11: (85) call #0
>>> regs=4 stack=0 before 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>> regs=4 stack=0 before 8: (18) r9 = 0x8e9700000000
>>> regs=4 stack=0 before 6: (18) r8 = 0xffff888171ee8000
>>> regs=4 stack=0 before 4: (18) r7 = 0xffff888171ee9000
>>> regs=4 stack=0 before 2: (18) r6 = 0xffff88817d563000
>>> regs=4 stack=0 before 0: (18) r2 = 0x8000000000000
>>> R2 32-bit pointer arithmetic prohibited
>>> processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 4a25375ebb0d..abc7e96d826f 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -2743,6 +2743,9 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
>>> *reg_mask |= sreg;
>>> } else if (class == BPF_JMP || class == BPF_JMP32) {
>>> if (opcode == BPF_CALL) {
>>> + /* skip for now, should return error when we find this in fixup_kfunc_call */
>>> + if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && insn->imm == 0)
>>> + return 0;
>> Makes sense to me. Please submit as an official patch
>> with s/return 0/return -ENOTSUPP/
>> Also 'skip for now' isn't quite correct here.
>> In check_kfunc_call() it's correct, since invalid kfunc with imm==0
>> could be eliminated during dead code elimination,
>> but since we're walking this insn here in backtrack_insn
>> the dead code elimination is not going to kick in.
>> So it's surely invalid kfunc call if we see it in backtrack_insn.
>> The comment should probably be something like:
>> /* kfunc with imm==0 is invalid and fixup_kfunc_call will catch
>> this error later. Make backtracking conservative with ENOTSUPP. */
>
> Do we have the same issue if we have
> call #1 <or some valid kfunc>
> instead of
> call #0
> ?

Seems no. If that happened, then, it’s another bug. Because as Andrii commented,
the prog should be rejected due to !read_ok before backing track to the kfunc
call. In this particular case, the invalid kfunc call is skipped without marking
Regs as uninitiated so the verifier incorrectly backtrack to that invalid call.