Re: [syzbot] BUG: corrupted list in netif_napi_add

From: Jakub Kicinski
Date: Mon Oct 18 2021 - 13:58:38 EST


On Mon, 18 Oct 2021 19:40:40 +0200 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@xxxxxxxxxx> writes:
>
> > On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> >> We got a use-after-free with very similar trace [0] during nightly
> >> regression. The issue happens when ip link up/down state is flipped
> >> several times in loop and doesn't reproduce for me manually. The fact
> >> that it didn't reproduce for me after running test ten times suggests
> >> that it is either very hard to reproduce or that it is a result of some
> >> interaction between several tests in our suite.
> >>
> >> [0]:
> >>
> >> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> >> [ 3187.890694] ==================================================================
> >> [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
> >> [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
> >
> > Hm, not sure how similar it is. This one looks like channel was freed
> > without deleting NAPI. Do you have list debug enabled?
>
> Well, the other report[0] also kinda looks like the NAPI thread keeps
> running after it should have been disabled, so maybe they are in fact
> related?
>
> [0] https://lore.kernel.org/r/000000000000c1524005cdeacc5f@xxxxxxxxxx

Could be, if napi->state gets corrupted it may lose NAPI_STATE_LISTED.

719c57197010 ("net: make napi_disable() symmetric with enable")
3765996e4f0b ("napi: fix race inside napi_enable")
is the only thing that comes to mind, but they look fine to me.