Re: INFO: rcu detected stall in wg_packet_tx_worker

From: Eric Dumazet
Date: Sun Apr 26 2020 - 16:27:03 EST




On 4/26/20 12:42 PM, Jason A. Donenfeld wrote:
> On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>>
>>
>>
>> On 4/26/20 10:57 AM, syzbot wrote:
>>> syzbot has bisected this bug to:
>>>
>>> commit e7096c131e5161fa3b8e52a650d7719d2857adfd
>>> Author: Jason A. Donenfeld <Jason@xxxxxxxxx>
>>> Date: Sun Dec 8 23:27:34 2019 +0000
>>>
>>> net: WireGuard secure network tunnel
>>>
>>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000
>>> start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/..
>>> git tree: upstream
>>> final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a
>>> userspace arch: i386
>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000
>>>
>>> Reported-by: syzbot+0251e883fe39e7a0cb0a@xxxxxxxxxxxxxxxxxxxxxxxxx
>>> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
>>>
>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>>>
>>
>> I have not looked at the repro closely, but WireGuard has some workers
>> that might loop forever, cond_resched() might help a bit.
>
> I'm working on this right now. Having a bit difficult of a time
> getting it to reproduce locally...
>
> The reports show the stall happening always at:
>
> static struct sk_buff *
> sfq_dequeue(struct Qdisc *sch)
> {
> struct sfq_sched_data *q = qdisc_priv(sch);
> struct sk_buff *skb;
> sfq_index a, next_a;
> struct sfq_slot *slot;
>
> /* No active slots */
> if (q->tail == NULL)
> return NULL;
>
> next_slot:
> a = q->tail->next;
> slot = &q->slots[a];
>
> Which is kind of interesting, because it's not like that should block
> or anything, unless there's some kasan faulting happening.
>

I am not really sure WireGuard is involved, the repro does not rely on it anyway.