Re: BUG at mm/mmap.c:2309 when cx18.ko and cx18-alsa.ko loaded

From: Hugh Dickins
Date: Sun Mar 06 2011 - 22:15:24 EST


On Sun, 6 Mar 2011, Andy Walls wrote:
> On Sun, 2011-03-06 at 10:37 -0800, Hugh Dickins wrote:
>
> > There was a horrid list corruption bug in early 2.6.38-rc, fixed in
> > -rc6; but although I guess it could cause all kinds of havoc, its
> > particular signature was not like this, so I don't really believe that
> > one was to blame here.
>
> Sounds like it may be worth me reviewing the commits that introduced the
> failure and the commit that fixed it. Do you happen to know what they
> are?

Here are the several fixes, which reference LKML threads and culprits:
it seems to have been a danger since 2.6.33, made much worse recently.

commit ceaaec98ad99859ac90ac6863ad0a6cd075d8e0e
Author: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Date: Thu Feb 17 22:59:19 2011 +0000

net: deinit automatic LIST_HEAD

commit 9b5e383c11b08784 (net: Introduce
unregister_netdevice_many()) left an active LIST_HEAD() in
rollback_registered(), with possible memory corruption.

Even if device is freed without touching its unreg_list (and therefore
touching the previous memory location holding LISTE_HEAD(single), better
close the bug for good, since its really subtle.

(Same fix for default_device_exit_batch() for completeness)

Reported-by: Michal Hocko <mhocko@xxxxxxx>
Tested-by: Michal Hocko <mhocko@xxxxxxx>
Reported-by: Eric W. Biderman <ebiderman@xxxxxxxxxxxx>
Tested-by: Eric W. Biderman <ebiderman@xxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxx>
CC: Octavian Purdila <opurdila@xxxxxxxxxxx>
CC: stable <stable@xxxxxxxxxx> [.33+]
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

commit f87e6f47933e3ebeced9bb12615e830a72cedce4
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Thu Feb 17 22:54:38 2011 +0000

net: dont leave active on stack LIST_HEAD

Eric W. Biderman and Michal Hocko reported various memory corruptions
that we suspected to be related to a LIST head located on stack, that
was manipulated after thread left function frame (and eventually exited,
so its stack was freed and reused).

Eric Dumazet suggested the problem was probably coming from commit
443457242beb (net: factorize
sync-rcu call in unregister_netdevice_many)

This patch fixes __dev_close() and dev_close() to properly deinit their
respective LIST_HEAD(single) before exiting.

References: https://lkml.org/lkml/2011/2/16/304
References: https://lkml.org/lkml/2011/2/14/223

Reported-by: Michal Hocko <mhocko@xxxxxxx>
Tested-by: Michal Hocko <mhocko@xxxxxxx>
Reported-by: Eric W. Biderman <ebiderman@xxxxxxxxxxxx>
Tested-by: Eric W. Biderman <ebiderman@xxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxx>
CC: Octavian Purdila <opurdila@xxxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

commit 3c18d4de86e4a7f93815c081e50e0543fa27200f
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Fri Feb 18 11:32:28 2011 -0800

Expand CONFIG_DEBUG_LIST to several other list operations

When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.

However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.

So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..

Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/