Re: net/core: BUG in unregister_netdevice_many

From: Linus Torvalds
Date: Fri Apr 21 2017 - 14:29:07 EST


On Fri, Apr 21, 2017 at 5:48 AM, Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> ------------[ cut here ]------------
> kernel BUG at net/core/dev.c:6813!

Another useless BUG_ON() that

(a) kills the machine
(b) doesn't tell the actual useful information.

Grr.

But that BUG_ON() is ancient, and actually goes back to pre-git days.
So you do seem to have triggered some code-path that doesn't ever
happen in any normal use.

This code looks odd:

void unregister_netdevice_many(struct list_head *head)
{
struct net_device *dev;

if (!list_empty(head)) {
rollback_registered_many(head);
list_for_each_entry(dev, head, unreg_list)
net_set_todo(dev);
list_del(head);
}
}

In particular the pattern of "look if list is empty, do something to
the list, and then delete the list" is a rather nasty pattern.

Why? That final "delete the list" looks like garbage to me.

Either that list is never used again - in which case the list_del() is
pointless - or it _is_ used again, in which case the list_del is
wrong.

Why is it wrong? Because "list_del()" only removes the list from the
head, but leaves the head itself untouched. So it will still end up
pointing to the list entries that we've just walked.

Now, almost every single case I looked at, the head is always a
temporary list that was created just for this
"unregister_netdevice_many()", so the code works fine, and it's a case
of "that list_del() is just pointless".

But it's a very dangerous pattern. Either the list head should be left
alone, or it should be cleaned up with list_del_init()

Anyway, because each user seems fine, and really just uses it as a
temporary list, this is not the cause of the bug.

I'm assuming that the real cause is simply that "dev->reg_state" ends
up being NETREG_UNREGISTERING or something. Maybe the BUG_ON() could
be just removed, and replaced by the previous warning about
NETREG_UNINITIALIZED.

Something like the attached (TOTALLY UNTESTED) patch.

We really shouldn't have BUG_ON()'s in the kernel, particularly for
cases that we already have error handling for and are ignoring. But
whatever.

Eric Dumazet seems to be the main person to look at this.

Linus
net/core/dev.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 533a6d6f6092..4e50b2fb3a90 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6801,16 +6801,14 @@ static void rollback_registered_many(struct list_head *head)
* for initialization unwind. Remove those
* devices and proceed with the remaining.
*/
- if (dev->reg_state == NETREG_UNINITIALIZED) {
- pr_debug("unregister_netdevice: device %s/%p never was registered\n",
- dev->name, dev);
+ if (WARN_ON_ONCE(dev->reg_state != NETREG_REGISTERED)) {
+ printk("unregister_netdevice: device %s/%p was not registered (%d)\n",
+ dev->name, dev, dev->reg_state);

- WARN_ON(1);
list_del(&dev->unreg_list);
continue;
}
dev->dismantle = true;
- BUG_ON(dev->reg_state != NETREG_REGISTERED);
}

/* If device is running, close it first. */